๐Ÿš€ K8s SRE Project Dashboard

Kubernetes Site Reliability Engineering Platform

๐Ÿ“‹ About This Project

This is a comprehensive Kubernetes Site Reliability Engineering (SRE) platform demonstrating microservices deployment, observability, testing, and GitOps practices. The project showcases modern cloud-native technologies and best practices for running production workloads on Kubernetes.

The platform includes a complete microservices e-commerce application (Online Boutique), comprehensive monitoring and logging, continuous delivery via GitOps, and automated testing frameworks for reliability assurance.

๐Ÿ›๏ธ Online Boutique Microservices

A Google Cloud microservices demo application showcasing a 12-tier e-commerce application. The application consists of multiple microservices written in different languages (Java, Go, Python, Node.js, C#) communicating via gRPC and HTTP.

Frontend Service Web UI

Main web interface for the e-commerce application

Access Frontend

Namespace: online-boutique

Services: frontend, cartservice, checkoutservice, productcatalogservice, recommendationservice, currencyservice, paymentservice, shippingservice, emailservice, adservice, redis-cart

๐Ÿงช Testing Frameworks

Sanity Test

Automated health check framework that continuously validates the health endpoints of all microservices in the online-boutique namespace. It tests connectivity, response times, and service availability for each microservice (gRPC, HTTP, and TCP protocols).

Sanity Test Dashboard Health Checks

Real-time health status of all microservices

View Dashboard

Namespace: sanity-test

Tests: All microservices in online-boutique namespace

Availability Test

Continuous availability testing framework that runs periodic tests (every 5 minutes) on critical services (Cart Service and Frontend Service). It provides a dashboard showing test history, success rates, and failure analysis.

Availability Test Dashboard Continuous Testing

Test history and availability metrics

View Dashboard

Namespace: availability-test

Test Interval: 300 seconds (5 minutes)

Tests: Cart Service, Frontend Service

๐Ÿ”„ Continuous Delivery (GitOps)

ArgoCD is used for GitOps-based continuous delivery, automatically syncing applications from Git repositories to Kubernetes clusters. All deployments are managed declaratively through Git, ensuring version control, audit trails, and consistent deployments.

ArgoCD UI GitOps

GitOps continuous delivery platform

Access: Port-forward required (kubectl port-forward -n argocd svc/argocd-server 8080:443)

URL: https://localhost:8080

Internal URL: https://argocd-server.argocd.svc.cluster.local:443

Namespace: argocd

Applications Managed:

  • Microservices Demo (online-boutique)
  • Monitoring Stack (Prometheus, Grafana, Loki)
  • Availability Test Framework
  • Sanity Test Framework

Features:

  • Automatic sync from Git repositories
  • Self-healing deployments
  • Application health monitoring
  • Rollback capabilities

๐Ÿ“Š Monitoring & Observability

Comprehensive observability stack providing metrics, logs, and visualization for the entire Kubernetes platform. The stack follows industry best practices for monitoring cloud-native applications.

Prometheus Metrics

Time-series metrics database and monitoring system

Features: Metrics collection, alerting, PromQL queries

Access Prometheus

Grafana Visualization

Analytics and visualization platform

Features: Dashboards, metrics visualization, log queries

Access Grafana

Loki Logs

Log aggregation system (Prometheus-inspired)

Features: Log collection, LogQL queries, log visualization

Access via Grafana Explore (no direct UI)

Prometheus

Namespace: monitoring

What it monitors:

  • Kubernetes cluster metrics (nodes, pods, services)
  • Application metrics from microservices
  • Kube State Metrics (cluster state)
  • Node Exporter (host metrics)

Grafana

Namespace: monitoring

Data Sources: Prometheus (metrics), Loki (logs)

Features:

  • Pre-configured dashboards for Kubernetes monitoring
  • Custom dashboards for microservices
  • Log exploration with LogQL
  • Metrics visualization with PromQL
  • Alerting and notification rules

Loki

Namespace: monitoring

Log Collection: Promtail (DaemonSet) collects logs from all pods

Access: Via Grafana Explore (Loki data source)

Features:

  • Centralized log aggregation from all namespaces
  • LogQL query language (similar to PromQL)
  • Label-based log indexing
  • Log retention and storage