Kubernetes: Beginner to Expert

    A Complete Production-Ready Guide


    Table of Contents

    1. Getting Started
    2. Core Concepts
    3. Cluster Management
    4. Workloads
    5. Services and Networking
    6. Storage
    7. Configuration Management
    8. Security and RBAC
    9. Monitoring and Logging
    10. Troubleshooting
    11. Web Applications
    12. Advanced Topics
    13. Production Best Practices
    14. Performance and Optimization
    15. Multi-Cloud and Hybrid
    16. DevOps Integration

    Chapter 1: Getting Started

    Introduction to Kubernetes

    Kubernetes (K8s) is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. Originally developed by Google based on their Borg system, it’s now maintained by the Cloud Native Computing Foundation (CNCF).

    Core Benefits

    • Scalability: Horizontal and vertical scaling of applications
    • High Availability: Built-in fault tolerance and self-healing
    • Portability: Runs on any infrastructure (cloud, on-premises, hybrid)
    • Resource Efficiency: Optimal resource utilization across clusters
    • DevOps Integration: Seamless CI/CD pipeline integration
    • Extensibility: Rich ecosystem of tools and operators

    Architecture Overview 1

    graph TB
        subgraph "Control Plane"
            API[API Server:6443]
            ETCD[etcd:2379-2380]
            SCHED[Scheduler]
            CM[Controller Manager]
            CCM[Cloud ControllerManager]
        end
    
        subgraph "Worker Node 1"
            K1[Kubelet:10250]
            KP1[Kube-proxy]
            CR1[Container RuntimeDocker/containerd]
            P1[Pods]
            CNI1[CNI Plugin]
        end
    
        subgraph "Worker Node 2"
            K2[Kubelet:10250]
            KP2[Kube-proxy]
            CR2[Container RuntimeDocker/containerd]
            P2[Pods]
            CNI2[CNI Plugin]
        end
    
        subgraph "Add-ons"
            DNS[CoreDNS]
            DASH[Dashboard]
            METRICS[Metrics Server]
            INGRESS[Ingress Controller]
        end
    
        API <--> K1
        API <--> K2
        SCHED --> API
        CM --> API
        CCM --> API
        ETCD <--> API
    
        K1 --> CNI1
        K2 --> CNI2
    
        style API fill:#e1f5fe
        style ETCD fill:#f3e5f5
        style K1 fill:#e8f5e8
        style K2 fill:#e8f5e8

    Environment Setup Matrix

    EnvironmentUse CaseResourcesSetup TimeCost
    MinikubeLearning/Development2GB RAM, 2 CPUs10 minutesFree
    KindCI/CD Testing4GB RAM, 2 CPUs5 minutesFree
    K3sEdge/IoT512MB RAM, 1 CPU15 minutesFree
    kubeadmProduction Self-Managed8GB RAM, 4 CPUs60 minutesInfrastructure cost
    EKSProduction AWSVariable30 minutes$0.10/hour + nodes
    GKEProduction GCPVariable20 minutes$0.10/hour + nodes
    AKSProduction AzureVariable25 minutes$0.10/hour + nodes

    Container Evolution

    graph LR
        subgraph "Physical Servers"
            PS[Single Applicationper Server]
        end
    
        subgraph "Virtual Machines"
            VM1[App 1 + OS]
            VM2[App 2 + OS]
            HYP[Hypervisor]
            HOST1[Host OS]
        end
    
        subgraph "Containers"
            C1[App 1]
            C2[App 2]
            CE[Container Engine]
            HOST2[Host OS]
        end
    
        subgraph "Kubernetes"
            POD1[Pod 1]
            POD2[Pod 2]
            K8S[Kubernetes]
            NODES[Multiple Nodes]
        end
    
        PS --> VM1
        VM1 --> C1
        C1 --> POD1

    Why Kubernetes?

    • Container Orchestration: Manages containers at scale
    • Self-healing: Automatically replaces failed containers
    • Horizontal Scaling: Scales applications based on demand
    • Service Discovery: Built-in load balancing and service discovery
    • Rolling Updates: Zero-downtime deployments

    Architecture Overview 2

    graph TB
        subgraph "Control Plane"
            API[API Server]
            ETCD[etcd]
            SCHED[Scheduler]
            CM[Controller Manager]
        end
    
        subgraph "Worker Node 1"
            K1[Kubelet]
            KP1[Kube-proxy]
            CR1[Container Runtime]
            P1[Pods]
        end
    
        subgraph "Worker Node 2"
            K2[Kubelet]
            KP2[Kube-proxy]
            CR2[Container Runtime]
            P2[Pods]
        end
    
        API --> K1
        API --> K2
        SCHED --> API
        CM --> API
        ETCD --> API

    Control Plane Components

    • API Server: Central management point for all cluster operations
    • etcd: Distributed key-value store for cluster data
    • Scheduler: Assigns pods to nodes based on resource requirements
    • Controller Manager: Runs controllers that regulate cluster state

    Worker Node Components

    • Kubelet: Node agent that manages containers
    • Kube-proxy: Network proxy for service communication
    • Container Runtime: Runs containers (Docker, containerd, CRI-O)

    Container Fundamentals

    Before diving into Kubernetes, understanding containers is crucial:

    # Example Dockerfile
    FROM node:14-alpine
    WORKDIR /app
    COPY package*.json ./
    RUN npm install
    COPY . .
    EXPOSE 3000
    CMD ["npm", "start"]
    Dockerfile

    Containers vs VMs

    graph LR
        subgraph "Traditional VMs"
            H1[Host OS]
            HV[Hypervisor]
            VM1[Guest OS 1]
            VM2[Guest OS 2]
            A1[App 1]
            A2[App 2]
        end
    
        subgraph "Containers"
            H2[Host OS]
            CE[Container Engine]
            C1[Container 1]
            C2[Container 2]
            A3[App 1]
            A4[App 2]
        end

    Namespaces

    Namespaces provide logical separation within a cluster:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: production
      labels:
        environment: prod
    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: development
      labels:
        environment: dev
    YAML

    Chapter 2: Core Concepts

    Pods

    Pods are the smallest deployable units in Kubernetes:

    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx-pod
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"
    YAML

    Multi-container Pods

    apiVersion: v1
    kind: Pod
    metadata:
      name: multi-container-pod
    spec:
      containers:
      - name: web-server
        image: nginx:1.21
        ports:
        - containerPort: 80
      - name: log-aggregator
        image: fluentd:latest
        volumeMounts:
        - name: log-volume
          mountPath: /var/log
      volumes:
      - name: log-volume
        emptyDir: {}
    YAML

    Labels and Selectors

    graph LR
        subgraph "Pods with Labels"
            P1[Pod 1app=frontendtier=web]
            P2[Pod 2app=backendtier=api]
            P3[Pod 3app=frontendtier=web]
        end
    
        subgraph "Service Selector"
            S[Serviceselector: app=frontend]
        end
    
        S --> P1
        S --> P3
    # Pod with labels
    apiVersion: v1
    kind: Pod
    metadata:
      name: frontend-pod
      labels:
        app: frontend
        tier: web
        version: v1.0
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
    
    ---
    # Service using selectors
    apiVersion: v1
    kind: Service
    metadata:
      name: frontend-service
    spec:
      selector:
        app: frontend
        tier: web
      ports:
      - port: 80
        targetPort: 80
    YAML

    Chapter 3: Cluster Management

    Setting up Clusters

    Minikube (Local Development)

    # Install minikube
    curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-windows-amd64.exe
    sudo install minikube-windows-amd64.exe /usr/local/bin/minikube
    
    # Start cluster
    minikube start --driver=docker --memory=4096 --cpus=2
    
    # Enable addons
    minikube addons enable dashboard
    minikube addons enable ingress
    YAML

    kubeadm (Production)

    # Initialize control plane
    sudo kubeadm init --pod-network-cidr=10.244.0.0/16
    
    # Set up kubectl
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    # Install CNI plugin (Flannel)
    kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
    
    # Join worker nodes
    kubeadm join <control-plane-ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash>
    YAML

    Managed Kubernetes Services

    AWS EKS
    # eks-cluster.yaml
    apiVersion: eksctl.io/v1alpha5
    kind: ClusterConfig
    
    metadata:
      name: production-cluster
      region: us-west-2
    
    nodeGroups:
      - name: worker-nodes
        instanceType: t3.medium
        desiredCapacity: 3
        minSize: 1
        maxSize: 5
        volumeSize: 20
        ssh:
          allow: true
    YAML
    # Create EKS cluster
    eksctl create cluster -f eks-cluster.yaml
    Bash
    Google GKE
    # Create GKE cluster
    gcloud container clusters create production-cluster \
        --zone=us-central1-a \
        --num-nodes=3 \
        --machine-type=n1-standard-2 \
        --enable-autoscaling \
        --min-nodes=1 \
        --max-nodes=10
    Bash
    Azure AKS
    # Create resource group
    az group create --name myResourceGroup --location eastus
    
    # Create AKS cluster
    az aks create \
        --resource-group myResourceGroup \
        --name myAKSCluster \
        --node-count 3 \
        --enable-addons monitoring \
        --generate-ssh-keys
    Bash

    kubectl Commands

    Cluster Information

    # Cluster info
    kubectl cluster-info
    kubectl version
    kubectl get nodes
    
    # Detailed node information
    kubectl describe node <node-name>
    
    # Cluster events
    kubectl get events --sort-by=.metadata.creationTimestamp
    Bash

    Resource Management

    # Get resources
    kubectl get pods
    kubectl get pods -o wide
    kubectl get pods --all-namespaces
    kubectl get pods -l app=nginx
    
    # Describe resources
    kubectl describe pod <pod-name>
    kubectl describe service <service-name>
    
    # Logs
    kubectl logs <pod-name>
    kubectl logs -f <pod-name>  # Follow logs
    kubectl logs <pod-name> -c <container-name>  # Multi-container pod
    
    # Execute commands
    kubectl exec -it <pod-name> -- /bin/bash
    kubectl exec -it <pod-name> -c <container-name> -- /bin/sh
    Bash

    Resource Creation and Updates

    # Create resources
    kubectl create -f deployment.yaml
    kubectl apply -f deployment.yaml
    
    # Update resources
    kubectl edit deployment <deployment-name>
    kubectl patch deployment <deployment-name> -p '{"spec":{"replicas":5}}'
    
    # Delete resources
    kubectl delete pod <pod-name>
    kubectl delete -f deployment.yaml
    kubectl delete deployment,service -l app=myapp
    Bash

    Chapter 4: Workloads

    Deployments

    Deployments manage ReplicaSets and provide declarative updates to Pods:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
      labels:
        app: nginx
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.21
            ports:
            - containerPort: 80
            resources:
              requests:
                memory: "64Mi"
                cpu: "250m"
              limits:
                memory: "128Mi"
                cpu: "500m"
            livenessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 30
              periodSeconds: 10
            readinessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 5
              periodSeconds: 5
    YAML

    Deployment Strategies

    graph TB
        subgraph "Rolling Update"
            RU1[Old Pods: 3]
            RU2[New Pod: 1]
            RU3[Old Pods: 2, New Pods: 2]
            RU4[Old Pods: 1, New Pods: 3]
            RU5[New Pods: 3]
        end
    
        subgraph "Blue-Green"
            BG1[Blue Environment: Active]
            BG2[Green Environment: Standby]
            BG3[Switch Traffic]
            BG4[Green Environment: Active]
        end
    # Rolling update strategy
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: rolling-deployment
    spec:
      replicas: 10
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 2
          maxSurge: 2
      # ... rest of spec
    
    ---
    # Recreate strategy
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: recreate-deployment
    spec:
      replicas: 3
      strategy:
        type: Recreate
      # ... rest of spec
    YAML

    ReplicaSets

    apiVersion: apps/v1
    kind: ReplicaSet
    metadata:
      name: nginx-replicaset
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:1.21
    YAML

    StatefulSets

    For stateful applications requiring stable network identities and persistent storage:

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: mysql-statefulset
    spec:
      serviceName: mysql
      replicas: 3
      selector:
        matchLabels:
          app: mysql
      template:
        metadata:
          labels:
            app: mysql
        spec:
          containers:
          - name: mysql
            image: mysql:8.0
            env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-secret
                  key: password
            ports:
            - containerPort: 3306
            volumeMounts:
            - name: mysql-data
              mountPath: /var/lib/mysql
      volumeClaimTemplates:
      - metadata:
          name: mysql-data
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    YAML

    StatefulSet vs Deployment

    graph LR
        subgraph "StatefulSet"
            SS1[mysql-0stable identity]
            SS2[mysql-1stable identity]
            SS3[mysql-2stable identity]
            PV1[PersistentVolume 1]
            PV2[PersistentVolume 2]
            PV3[PersistentVolume 3]
            SS1 --- PV1
            SS2 --- PV2
            SS3 --- PV3
        end
    
        subgraph "Deployment"
            D1[nginx-abc123random identity]
            D2[nginx-def456random identity]
            D3[nginx-ghi789random identity]
        end

    DaemonSets

    Ensures a pod runs on every (or selected) node:

    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: fluentd-daemonset
      labels:
        app: fluentd
    spec:
      selector:
        matchLabels:
          app: fluentd
      template:
        metadata:
          labels:
            app: fluentd
        spec:
          nodeSelector:
            kubernetes.io/os: linux
          containers:
          - name: fluentd
            image: fluentd:latest
            resources:
              limits:
                memory: 200Mi
              requests:
                cpu: 100m
                memory: 200Mi
            volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
          volumes:
          - name: varlog
            hostPath:
              path: /var/log
          - name: varlibdockercontainers
            hostPath:
              path: /var/lib/docker/containers
    YAML

    Jobs and CronJobs

    Jobs

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: data-migration-job
    spec:
      completions: 1
      parallelism: 1
      backoffLimit: 3
      template:
        spec:
          restartPolicy: Never
          containers:
          - name: migrate
            image: migrate/migrate
            command: ["migrate"]
            args: ["-path", "/migrations", "-database", "postgres://...", "up"]
            env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: url
    YAML

    CronJobs

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: backup-cronjob
    spec:
      schedule: "0 2 * * *"  # Daily at 2 AM
      jobTemplate:
        spec:
          template:
            spec:
              restartPolicy: OnFailure
              containers:
              - name: backup
                image: postgres:13
                command: ["pg_dump"]
                args: ["-h", "postgres-service", "-U", "postgres", "mydb"]
                env:
                - name: PGPASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: postgres-secret
                      key: password
                volumeMounts:
                - name: backup-storage
                  mountPath: /backup
              volumes:
              - name: backup-storage
                persistentVolumeClaim:
                  claimName: backup-pvc
    YAML

    Chapter 5: Services and Networking

    Service Types

    graph TB
        subgraph "ClusterIP"
            CI[Internal Traffic Only]
            CIP[Pod] --> CIS[ClusterIP Service]
        end
    
        subgraph "NodePort"
            NP[External Traffic via Node Port]
            NPP[Pod] --> NPS[NodePort Service]
            EXT1[External Client] --> NPN[Node:30080]
            NPN --> NPS
        end
    
        subgraph "LoadBalancer"
            LB[Cloud Load Balancer]
            LBP[Pod] --> LBS[LoadBalancer Service]
            EXT2[External Client] --> CCLB[Cloud LB]
            CCLB --> LBS
        end

    ClusterIP Service

    apiVersion: v1
    kind: Service
    metadata:
      name: backend-service
    spec:
      type: ClusterIP
      selector:
        app: backend
      ports:
      - port: 80
        targetPort: 8080
        protocol: TCP
    YAML

    NodePort Service

    apiVersion: v1
    kind: Service
    metadata:
      name: frontend-nodeport
    spec:
      type: NodePort
      selector:
        app: frontend
      ports:
      - port: 80
        targetPort: 80
        nodePort: 30080
    YAML

    LoadBalancer Service

    apiVersion: v1
    kind: Service
    metadata:
      name: web-loadbalancer
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: nlb
    spec:
      type: LoadBalancer
      selector:
        app: web
      ports:
      - port: 80
        targetPort: 80
    YAML

    ExternalName Service

    apiVersion: v1
    kind: Service
    metadata:
      name: external-database
    spec:
      type: ExternalName
      externalName: db.example.com
      ports:
      - port: 5432
    YAML

    Ingress

    graph LR
        CLIENT[Client] --> IGW[Internet Gateway]
        IGW --> ING[Ingress Controller]
        ING --> SVC1[Service 1]
        ING --> SVC2[Service 2]
        SVC1 --> POD1[Pods]
        SVC2 --> POD2[Pods]

    Ingress Resource

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: web-ingress
      annotations:
        kubernetes.io/ingress.class: "nginx"
        cert-manager.io/cluster-issuer: "letsencrypt-prod"
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
    spec:
      tls:
      - hosts:
        - myapp.example.com
        secretName: myapp-tls
      rules:
      - host: myapp.example.com
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: frontend-service
                port:
                  number: 80
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: backend-service
                port:
                  number: 80
    YAML

    NGINX Ingress Controller

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-ingress-controller
      namespace: ingress-nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx-ingress
      template:
        metadata:
          labels:
            app: nginx-ingress
        spec:
          containers:
          - name: nginx-ingress-controller
            image: k8s.gcr.io/ingress-nginx/controller:v1.1.1
            args:
            - /nginx-ingress-controller
            - --configmap=$(POD_NAMESPACE)/nginx-configuration
            - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
            - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
            env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            ports:
            - name: http
              containerPort: 80
            - name: https
              containerPort: 443
    YAML

    Network Policies

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: deny-all
    spec:
      podSelector: {}
      policyTypes:
      - Ingress
      - Egress
    
    ---
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-frontend-to-backend
    spec:
      podSelector:
        matchLabels:
          app: backend
      policyTypes:
      - Ingress
      ingress:
      - from:
        - podSelector:
            matchLabels:
              app: frontend
        ports:
        - protocol: TCP
          port: 8080
    YAML

    Chapter 6: Storage

    Persistent Volumes and Claims

    graph LR
        PV[PersistentVolume] --> PVC[PersistentVolumeClaim]
        PVC --> POD[Pod]
    
        subgraph "Storage Classes"
            SC1[fast-ssd]
            SC2[slow-hdd]
            SC3[network-storage]
        end
    
        SC1 --> PV

    PersistentVolume

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv-database
    spec:
      capacity:
        storage: 100Gi
      volumeMode: Filesystem
      accessModes:
      - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain
      storageClassName: fast-ssd
      csi:
        driver: ebs.csi.aws.com
        volumeHandle: vol-0abcd1234efgh5678
        fsType: ext4
    YAML

    PersistentVolumeClaim

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: database-pvc
    spec:
      accessModes:
      - ReadWriteOnce
      volumeMode: Filesystem
      resources:
        requests:
          storage: 50Gi
      storageClassName: fast-ssd
    YAML

    StorageClass

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: fast-ssd
    provisioner: ebs.csi.aws.com
    parameters:
      type: gp3
      iops: "3000"
      throughput: "125"
      encrypted: "true"
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true
    reclaimPolicy: Delete
    YAML

    Volume Types

    EmptyDir

    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod
    spec:
      containers:
      - name: app
        image: nginx
        volumeMounts:
        - name: cache-volume
          mountPath: /cache
      - name: sidecar
        image: busybox
        volumeMounts:
        - name: cache-volume
          mountPath: /shared
      volumes:
      - name: cache-volume
        emptyDir:
          sizeLimit: 1Gi
    YAML

    HostPath

    apiVersion: v1
    kind: Pod
    metadata:
      name: hostpath-pod
    spec:
      containers:
      - name: app
        image: nginx
        volumeMounts:
        - name: host-volume
          mountPath: /host-data
      volumes:
      - name: host-volume
        hostPath:
          path: /var/log
          type: Directory
    YAML

    ConfigMap Volume

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: app-config
    data:
      app.properties: |
        database.host=db.example.com
        database.port=5432
        cache.enabled=true
    
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: config-pod
    spec:
      containers:
      - name: app
        image: myapp:latest
        volumeMounts:
        - name: config-volume
          mountPath: /etc/config
      volumes:
      - name: config-volume
        configMap:
          name: app-config
    YAML

    Secret Volume

    apiVersion: v1
    kind: Secret
    metadata:
      name: db-secret
    type: Opaque
    data:
      username: bXl1c2Vy  # base64 encoded
      password: bXlwYXNz  # base64 encoded
    
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: secret-pod
    spec:
      containers:
      - name: app
        image: myapp:latest
        volumeMounts:
        - name: secret-volume
          mountPath: /etc/secrets
          readOnly: true
      volumes:
      - name: secret-volume
        secret:
          secretName: db-secret
    YAML

    Chapter 7: Configuration Management

    ConfigMaps

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: redis-config
    data:
      redis.conf: |
        bind 0.0.0.0
        port 6379
        timeout 0
        save 900 1
        save 300 10
        save 60 10000
      max-memory: "2gb"
      max-connections: "1000"
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: redis
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: redis
      template:
        metadata:
          labels:
            app: redis
        spec:
          containers:
          - name: redis
            image: redis:6.2
            env:
            - name: MAX_MEMORY
              valueFrom:
                configMapKeyRef:
                  name: redis-config
                  key: max-memory
            volumeMounts:
            - name: config
              mountPath: /usr/local/etc/redis
          volumes:
          - name: config
            configMap:
              name: redis-config
              items:
              - key: redis.conf
                path: redis.conf
    YAML

    Secrets

    # Create secret from command line
    # kubectl create secret generic db-secret \
    #   --from-literal=username=dbuser \
    #   --from-literal=password=secretpassword
    
    apiVersion: v1
    kind: Secret
    metadata:
      name: db-secret
    type: Opaque
    stringData:  # No need to base64 encode
      username: dbuser
      password: secretpassword
      connection-string: "postgresql://dbuser:secretpassword@db:5432/mydb"
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: web-app
      template:
        metadata:
          labels:
            app: web-app
        spec:
          containers:
          - name: web
            image: myapp:latest
            env:
            - name: DB_USER
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: username
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: password
            envFrom:
            - secretRef:
                name: db-secret
    YAML

    Managing Environment Variables

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: app-env
    data:
      ENVIRONMENT: "production"
      LOG_LEVEL: "info"
      FEATURE_FLAGS: "new-ui,advanced-search"
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: multi-env-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: multi-env-app
      template:
        metadata:
          labels:
            app: multi-env-app
        spec:
          containers:
          - name: app
            image: myapp:latest
            env:
            # Direct environment variable
            - name: APP_VERSION
              value: "1.2.3"
            # From field reference
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            # From ConfigMap
            - name: LOG_LEVEL
              valueFrom:
                configMapKeyRef:
                  name: app-env
                  key: LOG_LEVEL
            # From Secret
            - name: API_KEY
              valueFrom:
                secretKeyRef:
                  name: api-secret
                  key: key
            # All from ConfigMap
            envFrom:
            - configMapRef:
                name: app-env
            # All from Secret
            - secretRef:
                name: app-secret
    YAML

    Chapter 8: Security and RBAC

    Role-Based Access Control (RBAC)

    graph LR
        USER[User/ServiceAccount] --> RB[RoleBinding]
        RB --> ROLE[Role/ClusterRole]
        ROLE --> PERM[Permissions]
    
        subgraph "Scope"
            NS[Namespace Level]
            CL[Cluster Level]
        end

    ServiceAccount

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: pod-reader
      namespace: default
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      namespace: default
      name: pod-reader-role
    rules:
    - apiGroups: [""]
      resources: ["pods"]
      verbs: ["get", "watch", "list"]
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: read-pods
      namespace: default
    subjects:
    - kind: ServiceAccount
      name: pod-reader
      namespace: default
    roleRef:
      kind: Role
      name: pod-reader-role
      apiGroup: rbac.authorization.k8s.io
    YAML

    ClusterRole and ClusterRoleBinding

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: cluster-admin-role
    rules:
    - apiGroups: [""]
      resources: ["*"]
      verbs: ["*"]
    - apiGroups: ["apps"]
      resources: ["*"]
      verbs: ["*"]
    - apiGroups: ["networking.k8s.io"]
      resources: ["*"]
      verbs: ["*"]
    
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: cluster-admin-binding
    subjects:
    - kind: User
      name: admin@company.com
      apiGroup: rbac.authorization.k8s.io
    roleRef:
      kind: ClusterRole
      name: cluster-admin-role
      apiGroup: rbac.authorization.k8s.io
    YAML

    Pod Security Standards

    apiVersion: v1
    kind: Namespace
    metadata:
      name: secure-namespace
      labels:
        pod-security.kubernetes.io/enforce: restricted
        pod-security.kubernetes.io/audit: restricted
        pod-security.kubernetes.io/warn: restricted
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: secure-app
      namespace: secure-namespace
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: secure-app
      template:
        metadata:
          labels:
            app: secure-app
        spec:
          serviceAccountName: secure-sa
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            runAsGroup: 3000
            fsGroup: 2000
            seccompProfile:
              type: RuntimeDefault
          containers:
          - name: app
            image: myapp:latest
            securityContext:
              allowPrivilegeEscalation: false
              readOnlyRootFilesystem: true
              runAsNonRoot: true
              runAsUser: 1000
              capabilities:
                drop:
                - ALL
            volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: cache
              mountPath: /app/cache
          volumes:
          - name: tmp
            emptyDir: {}
          - name: cache
            emptyDir: {}
    YAML

    Network Policies

    # Deny all traffic
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: default-deny-all
      namespace: production
    spec:
      podSelector: {}
      policyTypes:
      - Ingress
      - Egress
    
    ---
    # Allow frontend to backend
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-frontend-to-backend
      namespace: production
    spec:
      podSelector:
        matchLabels:
          tier: backend
      policyTypes:
      - Ingress
      ingress:
      - from:
        - podSelector:
            matchLabels:
              tier: frontend
        - namespaceSelector:
            matchLabels:
              name: frontend-namespace
        ports:
        - protocol: TCP
          port: 8080
    
    ---
    # Allow egress to database
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-backend-to-db
      namespace: production
    spec:
      podSelector:
        matchLabels:
          tier: backend
      policyTypes:
      - Egress
      egress:
      - to:
        - podSelector:
            matchLabels:
              tier: database
        ports:
        - protocol: TCP
          port: 5432
      - to: []  # Allow DNS
        ports:
        - protocol: UDP
          port: 53
    YAML

    Resource Quotas and Limits

    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: compute-quota
      namespace: production
    spec:
      hard:
        requests.cpu: "10"
        requests.memory: 20Gi
        limits.cpu: "20"
        limits.memory: 40Gi
        persistentvolumeclaims: "10"
        pods: "20"
        services: "5"
        secrets: "10"
        configmaps: "10"
    
    ---
    apiVersion: v1
    kind: LimitRange
    metadata:
      name: limit-range
      namespace: production
    spec:
      limits:
      - default:
          cpu: "1"
          memory: "1Gi"
        defaultRequest:
          cpu: "100m"
          memory: "128Mi"
        max:
          cpu: "2"
          memory: "4Gi"
        min:
          cpu: "50m"
          memory: "64Mi"
        type: Container
    YAML

    Chapter 9: Monitoring and Logging

    Prometheus and Grafana Stack

    graph TB
        subgraph "Monitoring Stack"
            PROM[Prometheus Server]
            GRAF[Grafana]
            AM[AlertManager]
    
            subgraph "Exporters"
                NE[Node Exporter]
                CE[cAdvisor]
                KSM[Kube State Metrics]
            end
    
            subgraph "Applications"
                APP1[App 1]
                APP2[App 2]
            end
        end
    
        NE --> PROM
        CE --> PROM
        KSM --> PROM
        APP1 --> PROM
        APP2 --> PROM
        PROM --> GRAF
        PROM --> AM

    Prometheus Configuration

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
    data:
      prometheus.yml: |
        global:
          scrape_interval: 15s
    
        scrape_configs:
        - job_name: 'kubernetes-apiservers'
          kubernetes_sd_configs:
          - role: endpoints
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https
    
        - job_name: 'kubernetes-nodes'
          kubernetes_sd_configs:
          - role: node
          scheme: https
          tls_config:
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
    
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: prometheus
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus
      template:
        metadata:
          labels:
            app: prometheus
        spec:
          serviceAccountName: prometheus
          containers:
          - name: prometheus
            image: prom/prometheus:latest
            args:
            - '--config.file=/etc/prometheus/prometheus.yml'
            - '--storage.tsdb.path=/prometheus/'
            - '--web.console.libraries=/etc/prometheus/console_libraries'
            - '--web.console.templates=/etc/prometheus/consoles'
            - '--storage.tsdb.retention.time=200h'
            - '--web.enable-lifecycle'
            ports:
            - containerPort: 9090
            volumeMounts:
            - name: prometheus-config-volume
              mountPath: /etc/prometheus/
            - name: prometheus-storage-volume
              mountPath: /prometheus/
          volumes:
          - name: prometheus-config-volume
            configMap:
              name: prometheus-config
          - name: prometheus-storage-volume
            emptyDir: {}
    YAML

    Grafana Dashboard

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: grafana-datasources
    data:
      prometheus.yaml: |
        apiVersion: 1
        datasources:
        - name: Prometheus
          type: prometheus
          url: http://prometheus-service:9090
          access: proxy
          isDefault: true
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: grafana
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: grafana
      template:
        metadata:
          labels:
            app: grafana
        spec:
          containers:
          - name: grafana
            image: grafana/grafana:latest
            ports:
            - containerPort: 3000
            env:
            - name: GF_SECURITY_ADMIN_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: grafana-secret
                  key: admin-password
            volumeMounts:
            - name: grafana-datasources
              mountPath: /etc/grafana/provisioning/datasources
          volumes:
          - name: grafana-datasources
            configMap:
              name: grafana-datasources
    YAML

    Application Metrics

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: metrics-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: metrics-app
      template:
        metadata:
          labels:
            app: metrics-app
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "8080"
            prometheus.io/path: "/metrics"
        spec:
          containers:
          - name: app
            image: myapp:latest
            ports:
            - containerPort: 8080
            - containerPort: 9090  # Metrics port
            env:
            - name: METRICS_ENABLED
              value: "true"
    YAML

    EFK Stack (Elasticsearch, Fluentd, Kibana)

    graph LR
        APPS[Applications] --> FD[Fluentd DaemonSet]
        FD --> ES[Elasticsearch]
        ES --> KB[Kibana]
        KB --> USER[Users]

    Fluentd Configuration

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: fluentd-config
    data:
      fluent.conf: |
        <source>
          @type tail
          path /var/log/containers/*.log
          pos_file /var/log/fluentd-containers.log.pos
          time_format %Y-%m-%dT%H:%M:%S.%NZ
          tag kubernetes.*
          read_from_head true
          <parse>
            @type json
            time_key time
            time_format %Y-%m-%dT%H:%M:%S.%NZ
          </parse>
        </source>
    
        <filter kubernetes.**>
          @type kubernetes_metadata
        </filter>
    
        <match **>
          @type elasticsearch
          host elasticsearch-service
          port 9200
          logstash_format true
          logstash_prefix kubernetes
          <buffer>
            @type file
            path /var/log/fluentd-buffers/kubernetes.system.buffer
            flush_mode interval
            retry_type exponential_backoff
            flush_thread_count 2
            flush_interval 5s
            retry_forever
            retry_max_interval 30
            chunk_limit_size 2M
            queue_limit_length 8
            overflow_action block
          </buffer>
        </match>
    
    ---
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: fluentd
    spec:
      selector:
        matchLabels:
          name: fluentd
      template:
        metadata:
          labels:
            name: fluentd
        spec:
          serviceAccountName: fluentd
          containers:
          - name: fluentd
            image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
            env:
            - name: FLUENTD_SYSTEMD_CONF
              value: "disable"
            resources:
              limits:
                memory: 200Mi
              requests:
                cpu: 100m
                memory: 200Mi
            volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluentd-config
              mountPath: /fluentd/etc
          volumes:
          - name: varlog
            hostPath:
              path: /var/log
          - name: varlibdockercontainers
            hostPath:
              path: /var/lib/docker/containers
          - name: fluentd-config
            configMap:
              name: fluentd-config
    YAML

    Chapter 10: Troubleshooting

    Common Issues and Solutions

    Pod Troubleshooting

    # Check pod status
    kubectl get pods
    kubectl describe pod <pod-name>
    
    # Check logs
    kubectl logs <pod-name>
    kubectl logs <pod-name> --previous  # Previous container instance
    kubectl logs <pod-name> -c <container-name>  # Multi-container pod
    
    # Execute into pod
    kubectl exec -it <pod-name> -- /bin/bash
    
    # Port forwarding for testing
    kubectl port-forward <pod-name> 8080:80
    
    # Check events
    kubectl get events --sort-by=.metadata.creationTimestamp
    Bash

    Common Pod States and Solutions

    graph TD
        PENDING[Pending] --> |Check Resources| SCHEDULED[Scheduled]
        PENDING --> |Check Node Selector| NODE_ISSUE[Node Selection Issue]
        RUNNING[Running] --> |Check Logs| APP_ERROR[Application Error]
        FAILED[Failed] --> |Check Exit Code| RESTART[Restart Policy]
        CRASHLOOPBACKOFF[CrashLoopBackOff] --> |Fix App Logic| RUNNING

    Debugging Examples

    # Debug pod for troubleshooting
    apiVersion: v1
    kind: Pod
    metadata:
      name: debug-pod
    spec:
      containers:
      - name: debug
        image: nicolaka/netshoot
        command: ["/bin/bash"]
        args: ["-c", "while true; do sleep 30; done;"]
        securityContext:
          capabilities:
            add: ["NET_ADMIN"]
      hostNetwork: true
      hostPID: true
    YAML

    Network Troubleshooting

    # Test DNS resolution
    kubectl exec -it debug-pod -- nslookup kubernetes.default.svc.cluster.local
    
    # Test connectivity
    kubectl exec -it debug-pod -- curl -v http://service-name:port
    
    # Check network policies
    kubectl get networkpolicies
    kubectl describe networkpolicy <policy-name>
    
    # Test pod-to-pod communication
    kubectl exec -it pod1 -- ping <pod2-ip>
    Bash

    Resource Issues

    # Check node resources
    kubectl top nodes
    kubectl describe nodes
    
    # Check pod resource usage
    kubectl top pods
    kubectl top pods --containers
    
    # Check resource quotas
    kubectl get resourcequota
    kubectl describe resourcequota
    
    # Check limit ranges
    kubectl get limitrange
    kubectl describe limitrange
    Bash

    Monitoring Cluster Health

    apiVersion: v1
    kind: Pod
    metadata:
      name: cluster-health-check
    spec:
      containers:
      - name: health-check
        image: curlimages/curl
        command: ["/bin/sh"]
        args:
        - -c
        - |
          while true; do
            echo "=== Cluster Health Check ==="
    
            # Check API server
            if curl -k https://kubernetes.default.svc/healthz; then
              echo "API Server: OK"
            else
              echo "API Server: FAILED"
            fi
    
            # Check CoreDNS
            if nslookup kubernetes.default.svc.cluster.local; then
              echo "DNS: OK"
            else
              echo "DNS: FAILED"
            fi
    
            sleep 60
          done
    YAML

    Chapter 11: Web Applications

    Deploying a Complete Web Application

    graph TB
        subgraph "Frontend Tier"
            FE[React Frontend]
            ING[Ingress]
        end
    
        subgraph "Backend Tier"
            BE[Node.js API]
            SVC[Service]
        end
    
        subgraph "Database Tier"
            DB[PostgreSQL]
            PVC[Persistent Volume]
        end
    
        FE --> ING
        ING --> SVC
        SVC --> BE
        BE --> DB
        DB --> PVC

    Database Layer

    apiVersion: v1
    kind: Secret
    metadata:
      name: postgres-secret
    type: Opaque
    stringData:
      POSTGRES_DB: webapp
      POSTGRES_USER: appuser
      POSTGRES_PASSWORD: secretpassword
    
    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: postgres
    spec:
      serviceName: postgres
      replicas: 1
      selector:
        matchLabels:
          app: postgres
      template:
        metadata:
          labels:
            app: postgres
        spec:
          containers:
          - name: postgres
            image: postgres:13
            ports:
            - containerPort: 5432
            envFrom:
            - secretRef:
                name: postgres-secret
            volumeMounts:
            - name: postgres-data
              mountPath: /var/lib/postgresql/data
            resources:
              requests:
                memory: "256Mi"
                cpu: "250m"
              limits:
                memory: "512Mi"
                cpu: "500m"
            livenessProbe:
              exec:
                command:
                - pg_isready
                - -U
                - appuser
              initialDelaySeconds: 30
              periodSeconds: 10
            readinessProbe:
              exec:
                command:
                - pg_isready
                - -U
                - appuser
              initialDelaySeconds: 5
              periodSeconds: 5
      volumeClaimTemplates:
      - metadata:
          name: postgres-data
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: postgres-service
    spec:
      selector:
        app: postgres
      ports:
      - port: 5432
        targetPort: 5432
    YAML

    Backend API

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: backend-config
    data:
      NODE_ENV: "production"
      PORT: "3000"
      LOG_LEVEL: "info"
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: backend-api
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: backend-api
      template:
        metadata:
          labels:
            app: backend-api
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "3000"
            prometheus.io/path: "/metrics"
        spec:
          containers:
          - name: api
            image: mycompany/backend-api:v1.2.3
            ports:
            - containerPort: 3000
            env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: database-url
            envFrom:
            - configMapRef:
                name: backend-config
            resources:
              requests:
                memory: "128Mi"
                cpu: "100m"
              limits:
                memory: "256Mi"
                cpu: "500m"
            livenessProbe:
              httpGet:
                path: /health
                port: 3000
              initialDelaySeconds: 30
              periodSeconds: 10
            readinessProbe:
              httpGet:
                path: /ready
                port: 3000
              initialDelaySeconds: 5
              periodSeconds: 5
            volumeMounts:
            - name: logs
              mountPath: /app/logs
          volumes:
          - name: logs
            emptyDir: {}
    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: backend-service
    spec:
      selector:
        app: backend-api
      ports:
      - port: 80
        targetPort: 3000
      type: ClusterIP
    YAML

    Frontend Application

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: frontend-config
    data:
      nginx.conf: |
        server {
            listen 80;
            server_name localhost;
            root /usr/share/nginx/html;
            index index.html;
    
            # Gzip compression
            gzip on;
            gzip_types text/plain text/css application/json application/javascript text/xml application/xml;
    
            # Security headers
            add_header X-Content-Type-Options nosniff;
            add_header X-Frame-Options DENY;
            add_header X-XSS-Protection "1; mode=block";
    
            # API proxy
            location /api {
                proxy_pass http://backend-service;
                proxy_set_header Host $host;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            }
    
            # React Router support
            location / {
                try_files $uri $uri/ /index.html;
            }
        }
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: frontend
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: frontend
      template:
        metadata:
          labels:
            app: frontend
        spec:
          containers:
          - name: nginx
            image: mycompany/frontend:v1.2.3
            ports:
            - containerPort: 80
            volumeMounts:
            - name: nginx-config
              mountPath: /etc/nginx/conf.d
            resources:
              requests:
                memory: "64Mi"
                cpu: "50m"
              limits:
                memory: "128Mi"
                cpu: "100m"
            livenessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 10
              periodSeconds: 10
            readinessProbe:
              httpGet:
                path: /
                port: 80
              initialDelaySeconds: 5
              periodSeconds: 5
          volumes:
          - name: nginx-config
            configMap:
              name: frontend-config
    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: frontend-service
    spec:
      selector:
        app: frontend
      ports:
      - port: 80
        targetPort: 80
    YAML

    Ingress Configuration

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: webapp-ingress
      annotations:
        kubernetes.io/ingress.class: "nginx"
        cert-manager.io/cluster-issuer: "letsencrypt-prod"
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
        nginx.ingress.kubernetes.io/rate-limit: "100"
        nginx.ingress.kubernetes.io/cors-allow-origin: "https://myapp.com"
    spec:
      tls:
      - hosts:
        - myapp.com
        - api.myapp.com
        secretName: webapp-tls
      rules:
      - host: myapp.com
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: frontend-service
                port:
                  number: 80
      - host: api.myapp.com
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: backend-service
                port:
                  number: 80
    YAML

    WordPress Example

    apiVersion: v1
    kind: Secret
    metadata:
      name: mysql-secret
    type: Opaque
    stringData:
      MYSQL_ROOT_PASSWORD: rootpassword
      MYSQL_DATABASE: wordpress
      MYSQL_USER: wpuser
      MYSQL_PASSWORD: wppassword
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: mysql
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: mysql
      template:
        metadata:
          labels:
            app: mysql
        spec:
          containers:
          - name: mysql
            image: mysql:8.0
            envFrom:
            - secretRef:
                name: mysql-secret
            ports:
            - containerPort: 3306
            volumeMounts:
            - name: mysql-data
              mountPath: /var/lib/mysql
          volumes:
          - name: mysql-data
            persistentVolumeClaim:
              claimName: mysql-pvc
    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: mysql-service
    spec:
      selector:
        app: mysql
      ports:
      - port: 3306
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: wordpress
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: wordpress
      template:
        metadata:
          labels:
            app: wordpress
        spec:
          containers:
          - name: wordpress
            image: wordpress:latest
            env:
            - name: WORDPRESS_DB_HOST
              value: mysql-service
            - name: WORDPRESS_DB_NAME
              valueFrom:
                secretKeyRef:
                  name: mysql-secret
                  key: MYSQL_DATABASE
            - name: WORDPRESS_DB_USER
              valueFrom:
                secretKeyRef:
                  name: mysql-secret
                  key: MYSQL_USER
            - name: WORDPRESS_DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-secret
                  key: MYSQL_PASSWORD
            ports:
            - containerPort: 80
            volumeMounts:
            - name: wordpress-data
              mountPath: /var/www/html
          volumes:
          - name: wordpress-data
            persistentVolumeClaim:
              claimName: wordpress-pvc
    YAML

    Chapter 12: Advanced Topics

    Horizontal Pod Autoscaling (HPA)

    graph LR
        HPA[HPA Controller] --> METRICS[Metrics Server]
        METRICS --> PODS[Pod Metrics]
        HPA --> DEPLOY[Deployment]
        DEPLOY --> SCALE[Scale Pods]
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: webapp-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: webapp
      minReplicas: 2
      maxReplicas: 10
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 70
      - type: Resource
        resource:
          name: memory
          target:
            type: Utilization
            averageUtilization: 80
      - type: Pods
        pods:
          metric:
            name: custom_metric
          target:
            type: AverageValue
            averageValue: "100"
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 10
            periodSeconds: 60
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
          - type: Pods
            value: 4
            periodSeconds: 15
          selectPolicy: Max
    YAML

    Vertical Pod Autoscaling (VPA)

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: webapp-vpa
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: webapp
      updatePolicy:
        updateMode: "Auto"
      resourcePolicy:
        containerPolicies:
        - containerName: app
          minAllowed:
            cpu: 100m
            memory: 128Mi
          maxAllowed:
            cpu: 2
            memory: 4Gi
          controlledResources: ["cpu", "memory"]
    YAML

    Cluster Autoscaling

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cluster-autoscaler
      namespace: kube-system
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: cluster-autoscaler
      template:
        metadata:
          labels:
            app: cluster-autoscaler
        spec:
          serviceAccountName: cluster-autoscaler
          containers:
          - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
            name: cluster-autoscaler
            resources:
              limits:
                cpu: 100m
                memory: 300Mi
              requests:
                cpu: 100m
                memory: 300Mi
            command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
            - --balance-similar-node-groups
            - --skip-nodes-with-system-pods=false
            env:
            - name: AWS_REGION
              value: us-west-2
    YAML

    Custom Resources and Operators

    apiVersion: apiextensions.k8s.io/v1
    kind: CustomResourceDefinition
    metadata:
      name: webapps.example.com
    spec:
      group: example.com
      versions:
      - name: v1
        served: true
        storage: true
        schema:
          openAPIV3Schema:
            type: object
            properties:
              spec:
                type: object
                properties:
                  image:
                    type: string
                  replicas:
                    type: integer
                    minimum: 1
                    maximum: 10
                  port:
                    type: integer
              status:
                type: object
                properties:
                  availableReplicas:
                    type: integer
      scope: Namespaced
      names:
        plural: webapps
        singular: webapp
        kind: WebApp
        shortNames:
        - wa
    
    ---
    apiVersion: example.com/v1
    kind: WebApp
    metadata:
      name: my-webapp
    spec:
      image: nginx:1.21
      replicas: 3
      port: 80
    
    ---
    # Simple Operator example
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: webapp-operator
    spec:
      replicas: 1
      selector:
        matchLabels:
          name: webapp-operator
      template:
        metadata:
          labels:
            name: webapp-operator
        spec:
          containers:
          - name: webapp-operator
            image: mycompany/webapp-operator:latest
            env:
            - name: WATCH_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: OPERATOR_NAME
              value: "webapp-operator"
    YAML

    GitOps with ArgoCD

    graph LR
        DEV[Developer] --> GIT[Git Repository]
        GIT --> ARGO[ArgoCD]
        ARGO --> K8S[Kubernetes Cluster]
        ARGO --> |Sync Status| GIT
        ARGO --> |Deploy| K8S
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: webapp-production
      namespace: argocd
      finalizers:
        - resources-finalizer.argocd.argoproj.io
    spec:
      project: default
      source:
        repoURL: https://github.com/mycompany/k8s-manifests
        targetRevision: HEAD
        path: production/webapp
      destination:
        server: https://kubernetes.default.svc
        namespace: production
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
          allowEmpty: false
        syncOptions:
        - CreateNamespace=true
        - PrunePropagationPolicy=foreground
        - PruneLast=true
        retry:
          limit: 5
          backoff:
            duration: 5s
            factor: 2
            maxDuration: 3m
    YAML

    Service Mesh with Istio

    graph TB
        subgraph "Service Mesh"
            subgraph "Data Plane"
                POD1[Pod + Sidecar Proxy]
                POD2[Pod + Sidecar Proxy]
                POD3[Pod + Sidecar Proxy]
            end
    
            subgraph "Control Plane"
                PILOT[Pilot]
                CITADEL[Citadel]
                GALLEY[Galley]
            end
        end
    
        PILOT --> POD1
        PILOT --> POD2
        PILOT --> POD3
        CITADEL --> POD1
        CITADEL --> POD2
        CITADEL --> POD3

    Istio Gateway and VirtualService

    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
      name: webapp-gateway
    spec:
      selector:
        istio: ingressgateway
      servers:
      - port:
          number: 80
          name: http
          protocol: HTTP
        hosts:
        - myapp.example.com
      - port:
          number: 443
          name: https
          protocol: HTTPS
        tls:
          mode: SIMPLE
          credentialName: webapp-tls
        hosts:
        - myapp.example.com
    
    ---
    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
      name: webapp-vs
    spec:
      hosts:
      - myapp.example.com
      gateways:
      - webapp-gateway
      http:
      - match:
        - uri:
            prefix: "/api/v1"
        route:
        - destination:
            host: backend-service
            port:
              number: 80
          weight: 90
        - destination:
            host: backend-service-canary
            port:
              number: 80
          weight: 10
      - match:
        - uri:
            prefix: "/"
        route:
        - destination:
            host: frontend-service
            port:
              number: 80
    YAML

    Traffic Splitting and Canary Deployments

    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
      name: backend-destination
    spec:
      host: backend-service
      subsets:
      - name: v1
        labels:
          version: v1
      - name: v2
        labels:
          version: v2
      trafficPolicy:
        loadBalancer:
          simple: LEAST_CONN
        connectionPool:
          tcp:
            maxConnections: 100
          http:
            http1MaxPendingRequests: 50
            maxRequestsPerConnection: 5
        circuitBreaker:
          consecutiveErrors: 3
          interval: 30s
          baseEjectionTime: 30s
    
    ---
    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
      name: backend-canary
    spec:
      hosts:
      - backend-service
      http:
      - match:
        - headers:
            canary:
              exact: "true"
        route:
        - destination:
            host: backend-service
            subset: v2
      - route:
        - destination:
            host: backend-service
            subset: v1
          weight: 95
        - destination:
            host: backend-service
            subset: v2
          weight: 5
    YAML

    Multi-Cluster Management

    graph TB
        subgraph "Management Cluster"
            MC[Control Plane]
            ARGO[ArgoCD]
            FLUX[Flux]
        end
    
        subgraph "Production Cluster"
            PC[Workloads]
        end
    
        subgraph "Staging Cluster"
            SC[Workloads]
        end
    
        subgraph "Development Cluster"
            DC[Workloads]
        end
    
        MC --> PC
        MC --> SC
        MC --> DC
        ARGO --> PC
        ARGO --> SC
        FLUX --> DC

    Cluster API Example

    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    metadata:
      name: production-cluster
      namespace: default
    spec:
      clusterNetwork:
        pods:
          cidrBlocks: ["192.168.0.0/16"]
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AWSCluster
        name: production-cluster
      controlPlaneRef:
        kind: KubeadmControlPlane
        apiVersion: controlplane.cluster.x-k8s.io/v1beta1
        name: production-cluster-control-plane
    
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    metadata:
      name: production-cluster
    spec:
      region: us-west-2
      sshKeyName: my-ssh-key
    
    ---
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    metadata:
      name: production-cluster-control-plane
    spec:
      replicas: 3
      machineTemplate:
        infrastructureRef:
          kind: AWSMachineTemplate
          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
          name: production-cluster-control-plane
      kubeadmConfigSpec:
        initConfiguration:
          nodeRegistration:
            kubeletExtraArgs:
              cloud-provider: aws
        clusterConfiguration:
          apiServer:
            extraArgs:
              cloud-provider: aws
          controllerManager:
            extraArgs:
              cloud-provider: aws
    YAML

    Disaster Recovery and Backup

    # Velero backup configuration
    apiVersion: velero.io/v1
    kind: Backup
    metadata:
      name: daily-backup
      namespace: velero
    spec:
      includedNamespaces:
      - production
      - staging
      excludedResources:
      - events
      - events.events.k8s.io
      storageLocation: aws-s3
      ttl: 720h0m0s  # 30 days
      snapshotVolumes: true
    
    ---
    apiVersion: velero.io/v1
    kind: Schedule
    metadata:
      name: daily-backup-schedule
      namespace: velero
    spec:
      schedule: "0 2 * * *"  # Daily at 2 AM
      template:
        includedNamespaces:
        - production
        - staging
        excludedResources:
        - events
        - events.events.k8s.io
        storageLocation: aws-s3
        ttl: 720h0m0s
        snapshotVolumes: true
    
    ---
    # Restore example
    apiVersion: velero.io/v1
    kind: Restore
    metadata:
      name: production-restore
      namespace: velero
    spec:
      backupName: daily-backup-20250820
      includedNamespaces:
      - production
      restorePVs: true
    YAML

    Performance Optimization

    Pod Disruption Budget

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: webapp-pdb
    spec:
      minAvailable: 2
      selector:
        matchLabels:
          app: webapp
    
    ---
    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: critical-app-pdb
    spec:
      maxUnavailable: 1
      selector:
        matchLabels:
          tier: critical
    YAML

    Priority Classes

    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: high-priority
    value: 1000
    globalDefault: false
    description: "High priority class for critical applications"
    
    ---
    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: low-priority
    value: 100
    globalDefault: false
    description: "Low priority class for batch jobs"
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: critical-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: critical-app
      template:
        metadata:
          labels:
            app: critical-app
        spec:
          priorityClassName: high-priority
          containers:
          - name: app
            image: critical-app:latest
            resources:
              requests:
                memory: "256Mi"
                cpu: "500m"
              limits:
                memory: "512Mi"
                cpu: "1000m"
    YAML

    Node Affinity and Anti-Affinity

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: database-deployment
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: database
      template:
        metadata:
          labels:
            app: database
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                    - high-memory
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 1
                preference:
                  matchExpressions:
                  - key: zone
                    operator: In
                    values:
                    - us-west-2a
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - database
                topologyKey: kubernetes.io/hostname
          containers:
          - name: database
            image: postgres:13
            resources:
              requests:
                memory: "2Gi"
                cpu: "1000m"
              limits:
                memory: "4Gi"
                cpu: "2000m"
    YAML

    Cost Optimization

    Resource Recommendations

    # Install kube-resource-recommender
    kubectl apply -f https://github.com/robusta-dev/kubernetes-resource-recommender/releases/latest/download/install.yaml
    
    # Get recommendations
    kubectl get resourcerecommendations
    
    # Example recommendation output
    kubectl describe resourcerecommendation webapp-deployment
    Bash

    Cluster Cost Analysis

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: kubecost-config
    data:
      kubecostProductConfigs.json: |
        {
          "currencyCode": "USD",
          "discount": "30",
          "negotiatedDiscount": "10",
          "defaultIdle": "false",
          "serviceKeyName": "service",
          "departmentKeyName": "department",
          "teamKeyName": "team",
          "envKeyName": "env"
        }
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: kubecost
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: kubecost
      template:
        metadata:
          labels:
            app: kubecost
        spec:
          containers:
          - name: cost-analyzer
            image: gcr.io/kubecost1/cost-analyzer:latest
            ports:
            - containerPort: 9090
            env:
            - name: PROMETHEUS_SERVER_ENDPOINT
              value: "http://prometheus-service:9090"
            volumeMounts:
            - name: config
              mountPath: /var/configs
          volumes:
          - name: config
            configMap:
              name: kubecost-config
    YAML

    Chapter 13: Production Best Practices

    CI/CD Pipeline Integration

    graph LR
        DEV[Developer] --> GIT[Git Repository]
        GIT --> CI[CI Pipeline]
        CI --> BUILD[Build Image]
        BUILD --> TEST[Run Tests]
        TEST --> SCAN[Security Scan]
        SCAN --> PUSH[Push to Registry]
        PUSH --> CD[CD Pipeline]
        CD --> DEPLOY[Deploy to K8s]

    GitHub Actions Example

    # .github/workflows/deploy.yml
    name: Deploy to Kubernetes
    
    on:
      push:
        branches: [main]
      pull_request:
        branches: [main]
    
    env:
      REGISTRY: ghcr.io
      IMAGE_NAME: ${{ github.repository }}
    
    jobs:
      build-and-deploy:
        runs-on: ubuntu-latest
        permissions:
          contents: read
          packages: write
    
        steps:
        - name: Checkout repository
          uses: actions/checkout@v3
    
        - name: Setup Docker Buildx
          uses: docker/setup-buildx-action@v2
    
        - name: Log in to Container Registry
          uses: docker/login-action@v2
          with:
            registry: ${{ env.REGISTRY }}
            username: ${{ github.actor }}
            password: ${{ secrets.GITHUB_TOKEN }}
    
        - name: Extract metadata
          id: meta
          uses: docker/metadata-action@v4
          with:
            images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
            tags: |
              type=ref,event=branch
              type=ref,event=pr
              type=sha,prefix={{branch}}-
    
        - name: Build and push Docker image
          uses: docker/build-push-action@v4
          with:
            context: .
            push: true
            tags: ${{ steps.meta.outputs.tags }}
            labels: ${{ steps.meta.outputs.labels }}
            cache-from: type=gha
            cache-to: type=gha,mode=max
    
        - name: Configure kubectl
          uses: azure/k8s-set-context@v3
          with:
            method: kubeconfig
            kubeconfig: ${{ secrets.KUBE_CONFIG }}
    
        - name: Deploy to Kubernetes
          run: |
            # Update image tag in deployment
            sed -i "s|IMAGE_TAG|${{ steps.meta.outputs.tags }}|g" k8s/deployment.yaml
    
            # Apply manifests
            kubectl apply -f k8s/
    
            # Wait for rollout
            kubectl rollout status deployment/webapp
    
            # Verify deployment
            kubectl get pods -l app=webapp
    YAML

    Security Hardening

    Pod Security Context

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: secure-webapp
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: secure-webapp
      template:
        metadata:
          labels:
            app: secure-webapp
        spec:
          serviceAccountName: webapp-sa
          securityContext:
            runAsNonRoot: true
            runAsUser: 10001
            runAsGroup: 10001
            fsGroup: 10001
            seccompProfile:
              type: RuntimeDefault
            supplementalGroups: [10001]
          containers:
          - name: webapp
            image: myapp:latest
            securityContext:
              allowPrivilegeEscalation: false
              readOnlyRootFilesystem: true
              runAsNonRoot: true
              runAsUser: 10001
              capabilities:
                drop:
                - ALL
                add:
                - NET_BIND_SERVICE
            ports:
            - containerPort: 8080
            volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: cache
              mountPath: /app/cache
            - name: logs
              mountPath: /app/logs
            livenessProbe:
              httpGet:
                path: /health
                port: 8080
              initialDelaySeconds: 30
              periodSeconds: 10
            readinessProbe:
              httpGet:
                path: /ready
                port: 8080
              initialDelaySeconds: 5
              periodSeconds: 5
            resources:
              requests:
                memory: "128Mi"
                cpu: "100m"
              limits:
                memory: "256Mi"
                cpu: "200m"
          volumes:
          - name: tmp
            emptyDir: {}
          - name: cache
            emptyDir: {}
          - name: logs
            emptyDir: {}
    YAML

    Image Security Scanning

    # Falco rules for runtime security
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: falco-rules
    data:
      application_rules.yaml: |
        - rule: Detect shell in container
          desc: Notice shell activity within a container
          condition: >
            spawned_process and container and
            shell_procs and proc.tty != 0 and
            container_entrypoint
          output: >
            Shell spawned in container (user=%user.name %container.info
            shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)
          priority: WARNING
    
        - rule: File below a known binary directory opened for writing
          desc: >
            The package management process modifies binaries in these directories.
            This rule is meant to detect other processes modifying binary files.
          condition: >
            bin_dir and evt.is_open_write
            and not package_mgmt_procs
            and not exe_running_docker_save
            and not python_running_get_pip
            and not python_running_ms_oms
          output: >
            File below a known binary directory opened for writing (user=%user.name
            command=%proc.cmdline file=%fd.name %container.info)
          priority: WARNING

    Observability Stack

    # Complete observability stack with Prometheus Operator
    apiVersion: v1
    kind: Namespace
    metadata:
      name: monitoring
    
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      name: prometheus
      namespace: monitoring
    spec:
      serviceAccountName: prometheus
      serviceMonitorSelector:
        matchLabels:
          team: frontend
      ruleSelector:
        matchLabels:
          team: frontend
          prometheus: prometheus
      resources:
        requests:
          memory: 400Mi
      storage:
        volumeClaimTemplate:
          spec:
            storageClassName: fast-ssd
            resources:
              requests:
                storage: 50Gi
      alerting:
        alertmanagers:
        - namespace: monitoring
          name: alertmanager-main
          port: web
    
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: webapp-monitor
      namespace: monitoring
      labels:
        team: frontend
    spec:
      selector:
        matchLabels:
          app: webapp
      endpoints:
      - port: metrics
        interval: 30s
        path: /metrics
    
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: webapp-rules
      namespace: monitoring
      labels:
        team: frontend
        prometheus: prometheus
    spec:
      groups:
      - name: webapp.rules
        rules:
        - alert: WebAppDown
          expr: up{job="webapp"} == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "WebApp instance is down"
            description: "WebApp instance {{ $labels.instance }} has been down for more than 5 minutes."
    
        - alert: WebAppHighErrorRate
          expr: rate(http_requests_total{job="webapp",status=~"5.."}[5m]) > 0.1
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "High error rate detected"
            description: "Error rate is {{ $value }} errors per second for {{ $labels.instance }}"
    
        - alert: WebAppHighLatency
          expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="webapp"}[5m])) > 0.5
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High latency detected"
            description: "95th percentile latency is {{ $value }}s for {{ $labels.instance }}"
    YAML

    Documentation and Runbooks

    # ConfigMap containing runbooks
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: operational-runbooks
    data:
      incident-response.md: |
        # Incident Response Runbook
    
        ## Severity Levels
    
        ### P0 - Critical
        - Complete service outage
        - Data loss or corruption
        - Security breach
    
        **Response Time**: Immediate (< 15 minutes)
    
        ### P1 - High
        - Significant feature degradation
        - Performance issues affecting users
    
        **Response Time**: 1 hour
    
        ### P2 - Medium
        - Minor feature issues
        - Non-critical bugs
    
        **Response Time**: 24 hours
    
        ## Common Issues
    
        ### Pod CrashLoopBackOff
        ```bash
        # Check pod logs
        kubectl logs <pod-name> --previous
    
        # Check pod events
        kubectl describe pod <pod-name>
    
        # Check resource usage
        kubectl top pod <pod-name>
        ```
    
        ### Service Unavailable
        ```bash
        # Check service endpoints
        kubectl get endpoints <service-name>
    
        # Check pod readiness
        kubectl get pods -l app=<app-name>
    
        # Check ingress
        kubectl describe ingress <ingress-name>
        ```
    
        ### High Memory Usage
        ```bash
        # Check pod resource usage
        kubectl top pods --sort-by=memory
    
        # Check node resource usage
        kubectl top nodes
    
        # Restart high memory pods
        kubectl rollout restart deployment/<deployment-name>
        ```
    
      troubleshooting.md: |
        # Troubleshooting Guide
    
        ## Quick Diagnostic Commands
    
        ### Cluster Health
        ```bash
        # Check cluster components
        kubectl get componentstatuses
    
        # Check node status
        kubectl get nodes -o wide
    
        # Check system pods
        kubectl get pods -n kube-system
        ```
    
        ### Application Health
        ```bash
        # Check all resources in namespace
        kubectl get all -n <namespace>
    
        # Check recent events
        kubectl get events --sort-by=.metadata.creationTimestamp -n <namespace>
    
        # Check resource usage
        kubectl top pods -n <namespace> --sort-by=cpu
        ```
    
        ### Network Issues
        ```bash
        # Test DNS resolution
        kubectl run test-pod --image=busybox -it --rm -- nslookup kubernetes.default
    
        # Test service connectivity
        kubectl run test-pod --image=curlimages/curl -it --rm -- curl -v http://service-name:port/health
    
        # Check network policies
        kubectl get networkpolicies -A
        ```
    
        ### Storage Issues
        ```bash
        # Check PV status
        kubectl get pv
    
        # Check PVC status
        kubectl get pvc -A
    
        # Check storage classes
        kubectl get storageclass
        ```
    YAML

    Advanced Namespace Configuration

    apiVersion: v1
    kind: Namespace
    metadata:
      name: production
      labels:
        environment: prod
        team: backend
        cost-center: engineering
      annotations:
        description: "Production environment for backend services"
        contact: "backend-team@company.com"
        created-by: "platform-team"
    spec:
      finalizers:
      - kubernetes
    
    ---
    # Namespace with resource quotas and limits
    apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: production-quota
      namespace: production
    spec:
      hard:
        requests.cpu: "100"
        requests.memory: 200Gi
        limits.cpu: "200"
        limits.memory: 400Gi
        persistentvolumeclaims: "50"
        pods: "100"
        services: "20"
        secrets: "50"
        configmaps: "50"
        count/deployments.apps: "30"
        count/statefulsets.apps: "10"
        count/jobs.batch: "20"
    
    ---
    # Network policy for namespace isolation
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: production-isolation
      namespace: production
    spec:
      podSelector: {}
      policyTypes:
      - Ingress
      - Egress
      ingress:
      - from:
        - namespaceSelector:
            matchLabels:
              name: production
        - namespaceSelector:
            matchLabels:
              name: monitoring
      egress:
      - to:
        - namespaceSelector:
            matchLabels:
              name: production
      - to: []  # Allow DNS
        ports:
        - protocol: UDP
          port: 53
    YAML

    Chapter 14: Performance and Optimization

    Resource Management Strategies

    Comprehensive Resource Planning

    # Resource-optimized deployment with multiple strategies
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: optimized-app
      labels:
        app: optimized-app
        version: v1.0.0
    spec:
      replicas: 3
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxUnavailable: 1
          maxSurge: 1
      selector:
        matchLabels:
          app: optimized-app
      template:
        metadata:
          labels:
            app: optimized-app
            version: v1.0.0
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/port: "8080"
            prometheus.io/path: "/metrics"
        spec:
          # Advanced scheduling
          priorityClassName: high-priority
          terminationGracePeriodSeconds: 30
    
          # Node selection and affinity
          nodeSelector:
            node-type: compute-optimized
    
          affinity:
            nodeAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 100
                preference:
                  matchExpressions:
                  - key: zone
                    operator: In
                    values: ["us-west-2a", "us-west-2b"]
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 100
                podAffinityTerm:
                  labelSelector:
                    matchExpressions:
                    - key: app
                      operator: In
                      values:
                      - optimized-app
                  topologyKey: kubernetes.io/hostname
    
          # Security context
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            fsGroup: 2000
            seccompProfile:
              type: RuntimeDefault
    
          containers:
          - name: app
            image: myapp:v1.0.0
            imagePullPolicy: IfNotPresent
    
            # Resource management
            resources:
              requests:
                memory: "256Mi"
                cpu: "200m"
                ephemeral-storage: "1Gi"
              limits:
                memory: "512Mi"
                cpu: "500m"
                ephemeral-storage: "2Gi"
    
            # Security
            securityContext:
              allowPrivilegeEscalation: false
              readOnlyRootFilesystem: true
              runAsNonRoot: true
              runAsUser: 1000
              capabilities:
                drop: ["ALL"]
                add: ["NET_BIND_SERVICE"]
    
            # Health checks
            livenessProbe:
              httpGet:
                path: /health
                port: 8080
              initialDelaySeconds: 30
              periodSeconds: 10
              timeoutSeconds: 5
              failureThreshold: 3
              successThreshold: 1
    
            readinessProbe:
              httpGet:
                path: /ready
                port: 8080
              initialDelaySeconds: 5
              periodSeconds: 5
              timeoutSeconds: 3
              failureThreshold: 3
              successThreshold: 1
    
            startupProbe:
              httpGet:
                path: /startup
                port: 8080
              initialDelaySeconds: 10
              periodSeconds: 10
              timeoutSeconds: 3
              failureThreshold: 30
              successThreshold: 1
    
            # Environment and volumes
            env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
    
            volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: cache
              mountPath: /app/cache
            - name: config
              mountPath: /app/config
              readOnly: true
    
            ports:
            - containerPort: 8080
              name: http
              protocol: TCP
            - containerPort: 9090
              name: metrics
              protocol: TCP
    
          volumes:
          - name: tmp
            emptyDir:
              sizeLimit: 1Gi
          - name: cache
            emptyDir:
              sizeLimit: 2Gi
          - name: config
            configMap:
              name: app-config
              defaultMode: 0644
    
          # DNS configuration
          dnsPolicy: ClusterFirst
          dnsConfig:
            options:
            - name: ndots
              value: "2"
            - name: edns0
    YAML

    Advanced Autoscaling Configurations

    Multi-Metric HPA with Custom Metrics

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: advanced-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: optimized-app
      minReplicas: 2
      maxReplicas: 50
      metrics:
      # CPU utilization
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 70
    
      # Memory utilization
      - type: Resource
        resource:
          name: memory
          target:
            type: Utilization
            averageUtilization: 80
    
      # Custom metric: requests per second
      - type: Pods
        pods:
          metric:
            name: http_requests_per_second
          target:
            type: AverageValue
            averageValue: "100"
    
      # External metric: SQS queue length
      - type: External
        external:
          metric:
            name: sqs_queue_length
            selector:
              matchLabels:
                queue: "processing-queue"
          target:
            type: Value
            value: "10"
    
      # Scaling behavior
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 10
            periodSeconds: 60
          - type: Pods
            value: 2
            periodSeconds: 60
          selectPolicy: Min
        scaleUp:
          stabilizationWindowSeconds: 60
          policies:
          - type: Percent
            value: 100
            periodSeconds: 30
          - type: Pods
            value: 5
            periodSeconds: 30
          selectPolicy: Max
    
    ---
    # Vertical Pod Autoscaler with advanced configuration
    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: advanced-vpa
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: optimized-app
      updatePolicy:
        updateMode: "Auto"
        minReplicas: 2
      resourcePolicy:
        containerPolicies:
        - containerName: app
          minAllowed:
            cpu: 100m
            memory: 128Mi
          maxAllowed:
            cpu: 2
            memory: 4Gi
          controlledResources: ["cpu", "memory"]
          controlledValues: RequestsAndLimits
    YAML

    Performance Monitoring and Alerting

    # Comprehensive monitoring stack
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: app-performance-monitor
      labels:
        app: optimized-app
    spec:
      selector:
        matchLabels:
          app: optimized-app
      endpoints:
      - port: metrics
        interval: 15s
        path: /metrics
        scrapeTimeout: 10s
    
    ---
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: performance-alerts
    spec:
      groups:
      - name: performance.rules
        rules:
        # High CPU usage
        - alert: HighCPUUsage
          expr: rate(container_cpu_usage_seconds_total{pod=~"optimized-app-.*"}[5m]) * 100 > 80
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High CPU usage detected"
            description: "Pod {{ $labels.pod }} CPU usage is {{ $value }}%"
    
        # High memory usage
        - alert: HighMemoryUsage
          expr: container_memory_usage_bytes{pod=~"optimized-app-.*"} / container_spec_memory_limit_bytes * 100 > 85
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High memory usage detected"
            description: "Pod {{ $labels.pod }} memory usage is {{ $value }}%"
    
        # High response time
        - alert: HighResponseTime
          expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="optimized-app"}[5m])) > 1
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "High response time detected"
            description: "95th percentile response time is {{ $value }}s"
    
        # Low throughput
        - alert: LowThroughput
          expr: rate(http_requests_total{job="optimized-app"}[5m]) < 10
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Low throughput detected"
            description: "Request rate is {{ $value }} requests/second"
    YAML

    Chapter 15: Multi-Cloud and Hybrid

    Multi-Cloud Architecture

    graph TB
        subgraph "Management Layer"
            MC[Multi-Cloud Controller]
            ARGO[ArgoCD]
            TERRAFORM[Terraform]
        end
    
        subgraph "AWS"
            EKS[EKS Cluster]
            RDS[RDS Database]
            S3[S3 Storage]
        end
    
        subgraph "GCP"
            GKE[GKE Cluster]
            CLOUD_SQL[Cloud SQL]
            GCS[Cloud Storage]
        end
    
        subgraph "Azure"
            AKS[AKS Cluster]
            COSMOS[Cosmos DB]
            BLOB[Blob Storage]
        end
    
        subgraph "On-Premises"
            K8S[Kubernetes]
            DB[Database]
            NFS[NFS Storage]
        end
    
        MC --> EKS
        MC --> GKE
        MC --> AKS
        MC --> K8S
    
        ARGO --> EKS
        ARGO --> GKE
        ARGO --> AKS
        ARGO --> K8S
    
        style MC fill:#f9f,stroke:#333,stroke-width:2px

    Cluster API Multi-Cloud Setup

    # AWS Cluster
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    metadata:
      name: aws-production
      namespace: clusters
      labels:
        cloud: aws
        environment: production
    spec:
      clusterNetwork:
        pods:
          cidrBlocks: ["192.168.0.0/16"]
        services:
          cidrBlocks: ["10.96.0.0/12"]
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AWSCluster
        name: aws-production
      controlPlaneRef:
        kind: KubeadmControlPlane
        apiVersion: controlplane.cluster.x-k8s.io/v1beta1
        name: aws-production-control-plane
    
    ---
    # GCP Cluster
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    metadata:
      name: gcp-production
      namespace: clusters
      labels:
        cloud: gcp
        environment: production
    spec:
      clusterNetwork:
        pods:
          cidrBlocks: ["192.168.0.0/16"]
        services:
          cidrBlocks: ["10.96.0.0/12"]
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: GCPCluster
        name: gcp-production
      controlPlaneRef:
        kind: KubeadmControlPlane
        apiVersion: controlplane.cluster.x-k8s.io/v1beta1
        name: gcp-production-control-plane
    
    ---
    # Multi-cloud application deployment
    apiVersion: argoproj.io/v1alpha1
    kind: ApplicationSet
    metadata:
      name: multi-cloud-app
    spec:
      generators:
      - clusters:
          selector:
            matchLabels:
              environment: production
      template:
        metadata:
          name: '{{name}}-app'
        spec:
          project: default
          source:
            repoURL: https://github.com/company/k8s-manifests
            targetRevision: HEAD
            path: 'environments/{{metadata.labels.cloud}}'
          destination:
            server: '{{server}}'
            namespace: applications
          syncPolicy:
            automated:
              prune: true
              selfHeal: true
    YAML

    Cross-Cluster Service Mesh

    # Istio multi-cluster setup
    apiVersion: networking.istio.io/v1alpha3
    kind: Gateway
    metadata:
      name: cross-cluster-gateway
    spec:
      selector:
        istio: eastwestgateway
      servers:
      - port:
          number: 15443
          name: tls
          protocol: TLS
        tls:
          mode: ISTIO_MUTUAL
        hosts:
        - cross-network-primary.local
    
    ---
    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
      name: cross-cluster-service
    spec:
      host: remote-service.remote-cluster.local
      trafficPolicy:
        tls:
          mode: ISTIO_MUTUAL
      portLevelSettings:
      - port:
          number: 80
        loadBalancer:
          simple: LEAST_CONN
    YAML

    Chapter 16: DevOps Integration

    Advanced CI/CD Pipeline

    # GitLab CI/CD with Kubernetes integration
    stages:
      - test
      - build
      - security-scan
      - deploy-staging
      - integration-test
      - deploy-production
      - post-deploy
    
    variables:
      DOCKER_DRIVER: overlay2
      DOCKER_TLS_CERTDIR: "/certs"
      KUBERNETES_NAMESPACE: "production"
      HELM_CHART_PATH: "./helm/myapp"
    
    # Test stage
    test:
      stage: test
      image: node:16-alpine
      script:
        - npm ci
        - npm run test:unit
        - npm run test:integration
      coverage: '/Coverage: \d+\.\d+%/'
      artifacts:
        reports:
          coverage_report:
            coverage_format: cobertura
            path: coverage/cobertura-coverage.xml
        paths:
          - coverage/
        expire_in: 1 week
    
    # Build and push image
    build:
      stage: build
      image: docker:latest
      services:
        - docker:dind
      before_script:
        - echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
      script:
        - docker build --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') 
                       --build-arg VCS_REF=$CI_COMMIT_SHA 
                       --build-arg VERSION=$CI_COMMIT_TAG 
                       -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA 
                       -t $CI_REGISTRY_IMAGE:latest .
        - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
        - docker push $CI_REGISTRY_IMAGE:latest
    
    # Security scanning
    security-scan:
      stage: security-scan
      image: aquasecurity/trivy:latest
      script:
        - trivy image --exit-code 0 --no-progress --format template --template "@contrib/sarif.tpl" -o trivy-results.sarif $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
        - trivy image --exit-code 1 --severity HIGH,CRITICAL --no-progress $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
      artifacts:
        reports:
          sast: trivy-results.sarif
    
    # Deploy to staging
    deploy-staging:
      stage: deploy-staging
      image: alpine/helm:latest
      before_script:
        - kubectl config use-context staging
      script:
        - helm upgrade --install myapp-staging $HELM_CHART_PATH 
            --namespace staging 
            --set image.repository=$CI_REGISTRY_IMAGE 
            --set image.tag=$CI_COMMIT_SHA 
            --set environment=staging 
            --wait --timeout=300s
      environment:
        name: staging
        url: https://staging.myapp.com
      only:
        - develop
    
    # Integration tests
    integration-test:
      stage: integration-test
      image: postman/newman:alpine
      script:
        - newman run tests/integration/api-tests.json 
            --environment tests/integration/staging-env.json
            --reporters cli,junit --reporter-junit-export integration-results.xml
      artifacts:
        reports:
          junit: integration-results.xml
      dependencies:
        - deploy-staging
    
    # Production deployment
    deploy-production:
      stage: deploy-production
      image: alpine/helm:latest
      before_script:
        - kubectl config use-context production
      script:
        - helm upgrade --install myapp $HELM_CHART_PATH 
            --namespace production 
            --set image.repository=$CI_REGISTRY_IMAGE 
            --set image.tag=$CI_COMMIT_SHA 
            --set environment=production 
            --set replicaCount=5 
            --wait --timeout=600s
      environment:
        name: production
        url: https://myapp.com
      when: manual
      only:
        - main
    
    # Post-deployment verification
    post-deploy:
      stage: post-deploy
      image: curlimages/curl:latest
      script:
        - sleep 30  # Wait for deployment to stabilize
        - curl -f https://myapp.com/health || exit 1
        - curl -f https://myapp.com/metrics || exit 1
      dependencies:
        - deploy-production
    YAML

    Advanced Helm Chart Structure

    # Chart.yaml
    apiVersion: v2
    name: myapp
    description: A production-ready application Helm chart
    type: application
    version: 1.0.0
    appVersion: "1.0.0"
    keywords:
      - web
      - api
      - microservice
    home: https://github.com/company/myapp
    sources:
      - https://github.com/company/myapp
    maintainers:
      - name: Platform Team
        email: platform@company.com
    dependencies:
      - name: postgresql
        version: 11.9.13
        repository: https://charts.bitnami.com/bitnami
        condition: postgresql.enabled
      - name: redis
        version: 17.3.7
        repository: https://charts.bitnami.com/bitnami
        condition: redis.enabled
    
    ---
    # values.yaml with comprehensive configuration
    replicaCount: 3
    
    image:
      repository: mycompany/myapp
      pullPolicy: IfNotPresent
      tag: "latest"
    
    imagePullSecrets: []
    nameOverride: ""
    fullnameOverride: ""
    
    serviceAccount:
      create: true
      annotations: {}
      name: ""
    
    podAnnotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "8080"
      prometheus.io/path: "/metrics"
    
    podSecurityContext:
      runAsNonRoot: true
      runAsUser: 1000
      fsGroup: 2000
    
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 1000
      capabilities:
        drop:
        - ALL
    
    service:
      type: ClusterIP
      port: 80
      targetPort: 8080
    
    ingress:
      enabled: true
      className: "nginx"
      annotations:
        cert-manager.io/cluster-issuer: "letsencrypt-prod"
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
        nginx.ingress.kubernetes.io/rate-limit: "100"
      hosts:
        - host: myapp.example.com
          paths:
            - path: /
              pathType: Prefix
      tls:
        - secretName: myapp-tls
          hosts:
            - myapp.example.com
    
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 200m
        memory: 256Mi
    
    autoscaling:
      enabled: true
      minReplicas: 2
      maxReplicas: 10
      targetCPUUtilizationPercentage: 70
      targetMemoryUtilizationPercentage: 80
    
    nodeSelector: {}
    
    tolerations: []
    
    affinity:
      podAntiAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - myapp
            topologyKey: kubernetes.io/hostname
    
    # Database configuration
    postgresql:
      enabled: true
      auth:
        existingSecret: "myapp-db-secret"
      primary:
        persistence:
          enabled: true
          size: 20Gi
          storageClass: "fast-ssd"
    
    redis:
      enabled: true
      auth:
        enabled: true
        existingSecret: "myapp-redis-secret"
      master:
        persistence:
          enabled: true
          size: 8Gi
    
    # Application-specific configuration
    config:
      environment: production
      logLevel: info
      features:
        newUI: true
        advancedSearch: true
        analytics: true
    
    # Monitoring
    monitoring:
      enabled: true
      serviceMonitor:
        enabled: true
        interval: 30s
    
    # Backup configuration
    backup:
      enabled: true
      schedule: "0 2 * * *"
      retention: "30d"
    YAML

    Progressive Delivery with Flagger

    # Canary deployment configuration
    apiVersion: flagger.app/v1beta1
    kind: Canary
    metadata:
      name: myapp
      namespace: production
    spec:
      # Deployment reference
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: myapp
    
      # HPA reference (optional)
      autoscalerRef:
        apiVersion: autoscaling/v2beta2
        kind: HorizontalPodAutoscaler
        name: myapp
    
      # Service configuration
      service:
        port: 80
        targetPort: 8080
        gateways:
        - myapp-gateway
        hosts:
        - myapp.example.com
        trafficPolicy:
          tls:
            mode: DISABLE
    
      # Canary analysis
      analysis:
        # Schedule interval
        interval: 1m
        # Max number of failed metric checks before rollback
        threshold: 5
        # Max traffic percentage routed to canary
        maxWeight: 50
        # Canary increment step
        stepWeight: 5
        # Prometheus checks
        metrics:
        - name: request-success-rate
          thresholdRange:
            min: 99
          interval: 1m
        - name: request-duration
          thresholdRange:
            max: 500
          interval: 30s
        # Load testing
        webhooks:
        - name: load-test
          url: http://flagger-loadtester.test/
          timeout: 5s
          metadata:
            cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary.production:80/"
    
      # Alert manager configuration
      alerting:
        providers:
        - name: "on-call"
          type: slack
          channel: alerts
          username: flagger
    YAML

    Complete Monitoring Stack

    # Comprehensive monitoring with Kustomization
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    namespace: monitoring
    
    resources:
    - prometheus-operator.yaml
    - prometheus.yaml
    - alertmanager.yaml
    - grafana.yaml
    - servicemonitors.yaml
    - rules.yaml
    
    configMapGenerator:
    - name: grafana-dashboards
      files:
      - dashboards/kubernetes-cluster.json
      - dashboards/kubernetes-pods.json
      - dashboards/application-metrics.json
    
    secretGenerator:
    - name: alertmanager-config
      files:
      - alertmanager.yml
    
    patchesStrategicMerge:
    - prometheus-patch.yaml
    - grafana-patch.yaml
    
    images:
    - name: prom/prometheus
      newTag: v2.40.0
    - name: grafana/grafana
      newTag: 9.2.0
    - name: prom/alertmanager
      newTag: v0.25.0
    
    replicas:
    - name: prometheus
      count: 2
    - name: grafana
      count: 2
    
    ---
    # Advanced Prometheus configuration
    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      name: prometheus
    spec:
      replicas: 2
      retention: 30d
      retentionSize: 50GB
    
      # Storage configuration
      storage:
        volumeClaimTemplate:
          spec:
            storageClassName: fast-ssd
            resources:
              requests:
                storage: 100Gi
    
      # Resource management
      resources:
        requests:
          memory: 2Gi
          cpu: 1000m
        limits:
          memory: 4Gi
          cpu: 2000m
    
      # Security
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
    
      # Service discovery
      serviceMonitorSelector:
        matchLabels:
          monitoring: enabled
    
      podMonitorSelector:
        matchLabels:
          monitoring: enabled
    
      ruleSelector:
        matchLabels:
          monitoring: enabled
    
      # Additional scrape configs
      additionalScrapeConfigs:
        name: additional-scrape-configs
        key: prometheus-additional.yaml
    
      # Alerting
      alerting:
        alertmanagers:
        - namespace: monitoring
          name: alertmanager-operated
          port: web
    
      # External labels
      externalLabels:
        cluster: production
        region: us-west-2
    
      # Remote write configuration for long-term storage
      remoteWrite:
      - url: "https://prometheus-us-central1.grafana.net/api/prom/push"
        writeRelabelConfigs:
        - sourceLabels: [__name__]
          regex: 'kubernetes_.*'
          action: drop
        basicAuth:
          username:
            name: grafana-cloud-credentials
            key: username
          password:
            name: grafana-cloud-credentials
            key: password
    YAML

    Disaster Recovery Automation

    # Automated disaster recovery with Velero
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: velero-disaster-recovery
    data:
      recovery-script.sh: |
        #!/bin/bash
        set -e
    
        echo "Starting disaster recovery process..."
    
        # Validate backup exists
        BACKUP_NAME=${1:-"latest"}
        if ! velero backup get $BACKUP_NAME; then
            echo "Backup $BACKUP_NAME not found!"
            exit 1
        fi
    
        # Create restore
        RESTORE_NAME="dr-restore-$(date +%Y%m%d-%H%M%S)"
        velero restore create $RESTORE_NAME \
            --from-backup $BACKUP_NAME \
            --wait
    
        # Verify restore
        echo "Verifying restore..."
        kubectl get pods --all-namespaces
    
        # Run health checks
        echo "Running health checks..."
        for ns in production staging; do
            kubectl wait --for=condition=ready pod \
                --all -n $ns --timeout=300s
        done
    
        echo "Disaster recovery completed successfully!"
    
    ---
    # CronJob for regular DR testing
    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: dr-test
    spec:
      schedule: "0 2 * * 0"  # Weekly on Sunday at 2 AM
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: dr-test
                image: velero/velero:latest
                command: ["/bin/bash"]
                args:
                - -c
                - |
                  # Test backup integrity
                  velero backup describe daily-backup-$(date -d "yesterday" +%Y%m%d) \
                    --details || exit 1
    
                  # Test restore to test namespace
                  velero restore create test-restore-$(date +%Y%m%d) \
                    --from-backup daily-backup-$(date -d "yesterday" +%Y%m%d) \
                    --namespace-mappings production:dr-test \
                    --wait
    
                  # Verify test restore
                  kubectl wait --for=condition=ready pod \
                    --all -n dr-test --timeout=300s
    
                  # Cleanup test namespace
                  kubectl delete namespace dr-test --ignore-not-found
    
                  echo "DR test completed successfully"
              restartPolicy: OnFailure
    YAML

    Enhanced Quick Reference

    kubectl Power Commands

    # Advanced resource queries
    kubectl get pods -o custom-columns="NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName,IP:.status.podIP"
    kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}'
    
    # Resource usage monitoring
    kubectl top pods --all-namespaces --sort-by=memory
    kubectl top nodes --sort-by=cpu
    
    # Debugging and troubleshooting
    kubectl debug pod/my-pod -it --image=nicolaka/netshoot
    kubectl logs -f deployment/my-app --all-containers=true
    kubectl describe pod my-pod | grep -A 10 Events
    
    # Bulk operations
    kubectl delete pods --field-selector=status.phase==Failed
    kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\n"}{end}' | grep my-app
    
    # Security and RBAC
    kubectl auth can-i create pods --as=system:serviceaccount:default:my-sa
    kubectl get rolebindings,clusterrolebindings --all-namespaces -o wide
    
    # Resource management
    kubectl patch deployment my-app -p '{"spec":{"template":{"spec":{"containers":[{"name":"app","resources":{"requests":{"memory":"256Mi"}}}]}}}}'
    kubectl scale deployment my-app --replicas=5 --timeout=300s
    Bash

    Helm Advanced Commands

    # Chart development and testing
    helm create my-chart
    helm template my-app ./my-chart --debug
    helm lint ./my-chart
    helm test my-app
    
    # Release management
    helm upgrade my-app ./my-chart --reuse-values --wait --timeout=300s
    helm rollback my-app 1 --wait
    helm history my-app
    
    # Repository management
    helm repo add bitnami https://charts.bitnami.com/bitnami
    helm repo update
    helm search repo bitnami/postgresql --versions
    
    # Values and configuration
    helm get values my-app
    helm show values bitnami/postgresql
    helm upgrade my-app ./my-chart --set image.tag=v2.0.0 --set replicaCount=3
    Bash

    This comprehensive Kubernetes guide represents current best practices and production-ready patterns. As the ecosystem evolves, keep an eye on these emerging trends:

    Emerging Technologies

    1. WebAssembly (WASM): Running WASM workloads in Kubernetes
    2. Edge Computing: K3s, MicroK8s for edge deployments
    3. Serverless: Knative, OpenFaaS for serverless workloads
    4. AI/ML Operations: Kubeflow, MLflow integration
    5. eBPF: Advanced networking and security with Cilium
    6. GitOps Evolution: FluxCD v2, ArgoCD ApplicationSets

    Best Practices Summary

    1. Security-First Approach: Always implement security from the ground up
    2. Observability: Comprehensive monitoring, logging, and tracing
    3. Automation: GitOps, CI/CD, and Infrastructure as Code
    4. Resource Optimization: Right-sizing, autoscaling, and cost management
    5. Disaster Recovery: Regular backups, testing, and documented procedures
    6. Documentation: Maintain runbooks, troubleshooting guides, and architectural decisions

    Continuous Learning Resources

    • Certification Paths: CKA, CKAD, CKS
    • Community: CNCF, Kubernetes Slack, local meetups
    • Training Platforms: A Cloud Guru, Pluralsight, Linux Academy
    • Hands-on Practice: Katacoda, Play with Kubernetes
    • Conference Content: KubeCon, DockerCon, CloudNativeCon

    Remember: Kubernetes mastery is a journey, not a destination. Stay curious, keep experimenting, and always prioritize reliability and security in your deployments.


    This production-grade guide serves as your comprehensive reference for Kubernetes deployment and operations. Continue evolving your practices with the rapidly advancing cloud-native ecosystem.


    Discover more from Altgr Blog

    Subscribe to get the latest posts sent to your email.

    Leave a Reply

    Your email address will not be published. Required fields are marked *