Kubernetes: Beginner to Expert

A Complete Production-Ready Guide

Getting Started
Core Concepts
Cluster Management
Workloads
Services and Networking
Storage
Configuration Management
Security and RBAC
Monitoring and Logging
Troubleshooting
Web Applications
Advanced Topics
Production Best Practices
Performance and Optimization
Multi-Cloud and Hybrid
DevOps Integration

Chapter 1: Getting Started

Introduction to Kubernetes

Kubernetes (K8s) is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. Originally developed by Google based on their Borg system, it’s now maintained by the Cloud Native Computing Foundation (CNCF).

Core Benefits

Scalability: Horizontal and vertical scaling of applications
High Availability: Built-in fault tolerance and self-healing
Portability: Runs on any infrastructure (cloud, on-premises, hybrid)
Resource Efficiency: Optimal resource utilization across clusters
DevOps Integration: Seamless CI/CD pipeline integration
Extensibility: Rich ecosystem of tools and operators

Architecture Overview 1

graph TB
    subgraph "Control Plane"
        API[API Server:6443]
        ETCD[etcd:2379-2380]
        SCHED[Scheduler]
        CM[Controller Manager]
        CCM[Cloud ControllerManager]
    end

    subgraph "Worker Node 1"
        K1[Kubelet:10250]
        KP1[Kube-proxy]
        CR1[Container RuntimeDocker/containerd]
        P1[Pods]
        CNI1[CNI Plugin]
    end

    subgraph "Worker Node 2"
        K2[Kubelet:10250]
        KP2[Kube-proxy]
        CR2[Container RuntimeDocker/containerd]
        P2[Pods]
        CNI2[CNI Plugin]
    end

    subgraph "Add-ons"
        DNS[CoreDNS]
        DASH[Dashboard]
        METRICS[Metrics Server]
        INGRESS[Ingress Controller]
    end

    API <--> K1
    API <--> K2
    SCHED --> API
    CM --> API
    CCM --> API
    ETCD <--> API

    K1 --> CNI1
    K2 --> CNI2

    style API fill:#e1f5fe
    style ETCD fill:#f3e5f5
    style K1 fill:#e8f5e8
    style K2 fill:#e8f5e8

Environment Setup Matrix

Environment	Use Case	Resources	Setup Time	Cost
Minikube	Learning/Development	2GB RAM, 2 CPUs	10 minutes	Free
Kind	CI/CD Testing	4GB RAM, 2 CPUs	5 minutes	Free
K3s	Edge/IoT	512MB RAM, 1 CPU	15 minutes	Free
kubeadm	Production Self-Managed	8GB RAM, 4 CPUs	60 minutes	Infrastructure cost
EKS	Production AWS	Variable	30 minutes	$0.10/hour + nodes
GKE	Production GCP	Variable	20 minutes	$0.10/hour + nodes
AKS	Production Azure	Variable	25 minutes	$0.10/hour + nodes

Container Evolution

graph LR
    subgraph "Physical Servers"
        PS[Single Applicationper Server]
    end

    subgraph "Virtual Machines"
        VM1[App 1 + OS]
        VM2[App 2 + OS]
        HYP[Hypervisor]
        HOST1[Host OS]
    end

    subgraph "Containers"
        C1[App 1]
        C2[App 2]
        CE[Container Engine]
        HOST2[Host OS]
    end

    subgraph "Kubernetes"
        POD1[Pod 1]
        POD2[Pod 2]
        K8S[Kubernetes]
        NODES[Multiple Nodes]
    end

    PS --> VM1
    VM1 --> C1
    C1 --> POD1

Why Kubernetes?

Container Orchestration: Manages containers at scale
Self-healing: Automatically replaces failed containers
Horizontal Scaling: Scales applications based on demand
Service Discovery: Built-in load balancing and service discovery
Rolling Updates: Zero-downtime deployments

Architecture Overview 2

graph TB
    subgraph "Control Plane"
        API[API Server]
        ETCD[etcd]
        SCHED[Scheduler]
        CM[Controller Manager]
    end

    subgraph "Worker Node 1"
        K1[Kubelet]
        KP1[Kube-proxy]
        CR1[Container Runtime]
        P1[Pods]
    end

    subgraph "Worker Node 2"
        K2[Kubelet]
        KP2[Kube-proxy]
        CR2[Container Runtime]
        P2[Pods]
    end

    API --> K1
    API --> K2
    SCHED --> API
    CM --> API
    ETCD --> API

Control Plane Components

API Server: Central management point for all cluster operations
etcd: Distributed key-value store for cluster data
Scheduler: Assigns pods to nodes based on resource requirements
Controller Manager: Runs controllers that regulate cluster state

Worker Node Components

Kubelet: Node agent that manages containers
Kube-proxy: Network proxy for service communication
Container Runtime: Runs containers (Docker, containerd, CRI-O)

Container Fundamentals

Before diving into Kubernetes, understanding containers is crucial:

# Example Dockerfile
FROM node:14-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["npm", "start"]

# Example Dockerfile
FROM node:14-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["npm", "start"]

Dockerfile

Containers vs VMs

graph LR
    subgraph "Traditional VMs"
        H1[Host OS]
        HV[Hypervisor]
        VM1[Guest OS 1]
        VM2[Guest OS 2]
        A1[App 1]
        A2[App 2]
    end

    subgraph "Containers"
        H2[Host OS]
        CE[Container Engine]
        C1[Container 1]
        C2[Container 2]
        A3[App 1]
        A4[App 2]
    end

Namespaces

Namespaces provide logical separation within a cluster:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: prod
---
apiVersion: v1
kind: Namespace
metadata:
  name: development
  labels:
    environment: dev

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: prod
---
apiVersion: v1
kind: Namespace
metadata:
  name: development
  labels:
    environment: dev

YAML

Chapter 2: Core Concepts

Pods

Pods are the smallest deployable units in Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    ports:
    - containerPort: 80
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    ports:
    - containerPort: 80
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

YAML

Multi-container Pods

apiVersion: v1
kind: Pod
metadata:
  name: multi-container-pod
spec:
  containers:
  - name: web-server
    image: nginx:1.21
    ports:
    - containerPort: 80
  - name: log-aggregator
    image: fluentd:latest
    volumeMounts:
    - name: log-volume
      mountPath: /var/log
  volumes:
  - name: log-volume
    emptyDir: {}

apiVersion: v1
kind: Pod
metadata:
  name: multi-container-pod
spec:
  containers:
  - name: web-server
    image: nginx:1.21
    ports:
    - containerPort: 80
  - name: log-aggregator
    image: fluentd:latest
    volumeMounts:
    - name: log-volume
      mountPath: /var/log
  volumes:
  - name: log-volume
    emptyDir: {}

YAML

Labels and Selectors

graph LR
    subgraph "Pods with Labels"
        P1[Pod 1app=frontendtier=web]
        P2[Pod 2app=backendtier=api]
        P3[Pod 3app=frontendtier=web]
    end

    subgraph "Service Selector"
        S[Serviceselector: app=frontend]
    end

    S --> P1
    S --> P3

# Pod with labels
apiVersion: v1
kind: Pod
metadata:
  name: frontend-pod
  labels:
    app: frontend
    tier: web
    version: v1.0
spec:
  containers:
  - name: nginx
    image: nginx:1.21

---
# Service using selectors
apiVersion: v1
kind: Service
metadata:
  name: frontend-service
spec:
  selector:
    app: frontend
    tier: web
  ports:
  - port: 80
    targetPort: 80

# Pod with labels
apiVersion: v1
kind: Pod
metadata:
  name: frontend-pod
  labels:
    app: frontend
    tier: web
    version: v1.0
spec:
  containers:
  - name: nginx
    image: nginx:1.21

---
# Service using selectors
apiVersion: v1
kind: Service
metadata:
  name: frontend-service
spec:
  selector:
    app: frontend
    tier: web
  ports:
  - port: 80
    targetPort: 80

YAML

Chapter 3: Cluster Management

Setting up Clusters

Minikube (Local Development)

# Install minikube
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-windows-amd64.exe
sudo install minikube-windows-amd64.exe /usr/local/bin/minikube

# Start cluster
minikube start --driver=docker --memory=4096 --cpus=2

# Enable addons
minikube addons enable dashboard
minikube addons enable ingress

# Install minikube
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-windows-amd64.exe
sudo install minikube-windows-amd64.exe /usr/local/bin/minikube

# Start cluster
minikube start --driver=docker --memory=4096 --cpus=2

# Enable addons
minikube addons enable dashboard
minikube addons enable ingress

YAML

kubeadm (Production)

# Initialize control plane
sudo kubeadm init --pod-network-cidr=10.244.0.0/16

# Set up kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Install CNI plugin (Flannel)
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

# Join worker nodes
kubeadm join <control-plane-ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash>

# Initialize control plane
sudo kubeadm init --pod-network-cidr=10.244.0.0/16

# Set up kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Install CNI plugin (Flannel)
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

# Join worker nodes
kubeadm join <control-plane-ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash>

YAML

Managed Kubernetes Services

AWS EKS

# eks-cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production-cluster
  region: us-west-2

nodeGroups:
  - name: worker-nodes
    instanceType: t3.medium
    desiredCapacity: 3
    minSize: 1
    maxSize: 5
    volumeSize: 20
    ssh:
      allow: true

# eks-cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production-cluster
  region: us-west-2

nodeGroups:
  - name: worker-nodes
    instanceType: t3.medium
    desiredCapacity: 3
    minSize: 1
    maxSize: 5
    volumeSize: 20
    ssh:
      allow: true

YAML

# Create EKS cluster
eksctl create cluster -f eks-cluster.yaml

# Create EKS cluster
eksctl create cluster -f eks-cluster.yaml

Bash

Google GKE

# Create GKE cluster
gcloud container clusters create production-cluster \
    --zone=us-central1-a \
    --num-nodes=3 \
    --machine-type=n1-standard-2 \
    --enable-autoscaling \
    --min-nodes=1 \
    --max-nodes=10

# Create GKE cluster
gcloud container clusters create production-cluster \
    --zone=us-central1-a \
    --num-nodes=3 \
    --machine-type=n1-standard-2 \
    --enable-autoscaling \
    --min-nodes=1 \
    --max-nodes=10

Bash

Azure AKS

# Create resource group
az group create --name myResourceGroup --location eastus

# Create AKS cluster
az aks create \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --node-count 3 \
    --enable-addons monitoring \
    --generate-ssh-keys

# Create resource group
az group create --name myResourceGroup --location eastus

# Create AKS cluster
az aks create \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --node-count 3 \
    --enable-addons monitoring \
    --generate-ssh-keys

Bash

kubectl Commands

Cluster Information

# Cluster info
kubectl cluster-info
kubectl version
kubectl get nodes

# Detailed node information
kubectl describe node <node-name>

# Cluster events
kubectl get events --sort-by=.metadata.creationTimestamp

# Cluster info
kubectl cluster-info
kubectl version
kubectl get nodes

# Detailed node information
kubectl describe node <node-name>

# Cluster events
kubectl get events --sort-by=.metadata.creationTimestamp

Bash

Resource Management

# Get resources
kubectl get pods
kubectl get pods -o wide
kubectl get pods --all-namespaces
kubectl get pods -l app=nginx

# Describe resources
kubectl describe pod <pod-name>
kubectl describe service <service-name>

# Logs
kubectl logs <pod-name>
kubectl logs -f <pod-name>  # Follow logs
kubectl logs <pod-name> -c <container-name>  # Multi-container pod

# Execute commands
kubectl exec -it <pod-name> -- /bin/bash
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh

# Get resources
kubectl get pods
kubectl get pods -o wide
kubectl get pods --all-namespaces
kubectl get pods -l app=nginx

# Describe resources
kubectl describe pod <pod-name>
kubectl describe service <service-name>

# Logs
kubectl logs <pod-name>
kubectl logs -f <pod-name>  # Follow logs
kubectl logs <pod-name> -c <container-name>  # Multi-container pod

# Execute commands
kubectl exec -it <pod-name> -- /bin/bash
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh

Bash

Resource Creation and Updates

# Create resources
kubectl create -f deployment.yaml
kubectl apply -f deployment.yaml

# Update resources
kubectl edit deployment <deployment-name>
kubectl patch deployment <deployment-name> -p '{"spec":{"replicas":5}}'

# Delete resources
kubectl delete pod <pod-name>
kubectl delete -f deployment.yaml
kubectl delete deployment,service -l app=myapp

# Create resources
kubectl create -f deployment.yaml
kubectl apply -f deployment.yaml

# Update resources
kubectl edit deployment <deployment-name>
kubectl patch deployment <deployment-name> -p '{"spec":{"replicas":5}}'

# Delete resources
kubectl delete pod <pod-name>
kubectl delete -f deployment.yaml
kubectl delete deployment,service -l app=myapp

Bash

Chapter 4: Workloads

Deployments

Deployments manage ReplicaSets and provide declarative updates to Pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5

YAML

Deployment Strategies

graph TB
    subgraph "Rolling Update"
        RU1[Old Pods: 3]
        RU2[New Pod: 1]
        RU3[Old Pods: 2, New Pods: 2]
        RU4[Old Pods: 1, New Pods: 3]
        RU5[New Pods: 3]
    end

    subgraph "Blue-Green"
        BG1[Blue Environment: Active]
        BG2[Green Environment: Standby]
        BG3[Switch Traffic]
        BG4[Green Environment: Active]
    end

# Rolling update strategy
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rolling-deployment
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 2
      maxSurge: 2
  # ... rest of spec

---
# Recreate strategy
apiVersion: apps/v1
kind: Deployment
metadata:
  name: recreate-deployment
spec:
  replicas: 3
  strategy:
    type: Recreate
  # ... rest of spec

# Rolling update strategy
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rolling-deployment
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 2
      maxSurge: 2
  # ... rest of spec

---
# Recreate strategy
apiVersion: apps/v1
kind: Deployment
metadata:
  name: recreate-deployment
spec:
  replicas: 3
  strategy:
    type: Recreate
  # ... rest of spec

YAML

ReplicaSets

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: nginx-replicaset
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: nginx-replicaset
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21

YAML

StatefulSets

For stateful applications requiring stable network identities and persistent storage:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql-statefulset
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: password
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql-statefulset
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: password
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

YAML

StatefulSet vs Deployment

graph LR
    subgraph "StatefulSet"
        SS1[mysql-0stable identity]
        SS2[mysql-1stable identity]
        SS3[mysql-2stable identity]
        PV1[PersistentVolume 1]
        PV2[PersistentVolume 2]
        PV3[PersistentVolume 3]
        SS1 --- PV1
        SS2 --- PV2
        SS3 --- PV3
    end

    subgraph "Deployment"
        D1[nginx-abc123random identity]
        D2[nginx-def456random identity]
        D3[nginx-ghi789random identity]
    end

DaemonSets

Ensures a pod runs on every (or selected) node:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-daemonset
  labels:
    app: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      nodeSelector:
        kubernetes.io/os: linux
      containers:
      - name: fluentd
        image: fluentd:latest
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-daemonset
  labels:
    app: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      nodeSelector:
        kubernetes.io/os: linux
      containers:
      - name: fluentd
        image: fluentd:latest
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

YAML

Jobs and CronJobs

Jobs

apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration-job
spec:
  completions: 1
  parallelism: 1
  backoffLimit: 3
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: migrate
        image: migrate/migrate
        command: ["migrate"]
        args: ["-path", "/migrations", "-database", "postgres://...", "up"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: url

apiVersion: batch/v1
kind: Job
metadata:
  name: data-migration-job
spec:
  completions: 1
  parallelism: 1
  backoffLimit: 3
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: migrate
        image: migrate/migrate
        command: ["migrate"]
        args: ["-path", "/migrations", "-database", "postgres://...", "up"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: url

YAML

CronJobs

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-cronjob
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: postgres:13
            command: ["pg_dump"]
            args: ["-h", "postgres-service", "-U", "postgres", "mydb"]
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-cronjob
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: postgres:13
            command: ["pg_dump"]
            args: ["-h", "postgres-service", "-U", "postgres", "mydb"]
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc

YAML

Chapter 5: Services and Networking

Service Types

graph TB
    subgraph "ClusterIP"
        CI[Internal Traffic Only]
        CIP[Pod] --> CIS[ClusterIP Service]
    end

    subgraph "NodePort"
        NP[External Traffic via Node Port]
        NPP[Pod] --> NPS[NodePort Service]
        EXT1[External Client] --> NPN[Node:30080]
        NPN --> NPS
    end

    subgraph "LoadBalancer"
        LB[Cloud Load Balancer]
        LBP[Pod] --> LBS[LoadBalancer Service]
        EXT2[External Client] --> CCLB[Cloud LB]
        CCLB --> LBS
    end

ClusterIP Service

apiVersion: v1
kind: Service
metadata:
  name: backend-service
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP

apiVersion: v1
kind: Service
metadata:
  name: backend-service
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP

YAML

NodePort Service

apiVersion: v1
kind: Service
metadata:
  name: frontend-nodeport
spec:
  type: NodePort
  selector:
    app: frontend
  ports:
  - port: 80
    targetPort: 80
    nodePort: 30080

apiVersion: v1
kind: Service
metadata:
  name: frontend-nodeport
spec:
  type: NodePort
  selector:
    app: frontend
  ports:
  - port: 80
    targetPort: 80
    nodePort: 30080

YAML

LoadBalancer Service

apiVersion: v1
kind: Service
metadata:
  name: web-loadbalancer
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - port: 80
    targetPort: 80

apiVersion: v1
kind: Service
metadata:
  name: web-loadbalancer
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - port: 80
    targetPort: 80

YAML

ExternalName Service

apiVersion: v1
kind: Service
metadata:
  name: external-database
spec:
  type: ExternalName
  externalName: db.example.com
  ports:
  - port: 5432

apiVersion: v1
kind: Service
metadata:
  name: external-database
spec:
  type: ExternalName
  externalName: db.example.com
  ports:
  - port: 5432

YAML

Ingress

graph LR
    CLIENT[Client] --> IGW[Internet Gateway]
    IGW --> ING[Ingress Controller]
    ING --> SVC1[Service 1]
    ING --> SVC2[Service 2]
    SVC1 --> POD1[Pods]
    SVC2 --> POD2[Pods]

Ingress Resource

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: backend-service
            port:
              number: 80

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: backend-service
            port:
              number: 80

YAML

NGINX Ingress Controller

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: ingress-nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-ingress
  template:
    metadata:
      labels:
        app: nginx-ingress
    spec:
      containers:
      - name: nginx-ingress-controller
        image: k8s.gcr.io/ingress-nginx/controller:v1.1.1
        args:
        - /nginx-ingress-controller
        - --configmap=$(POD_NAMESPACE)/nginx-configuration
        - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
        - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-ingress-controller
  namespace: ingress-nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-ingress
  template:
    metadata:
      labels:
        app: nginx-ingress
    spec:
      containers:
      - name: nginx-ingress-controller
        image: k8s.gcr.io/ingress-nginx/controller:v1.1.1
        args:
        - /nginx-ingress-controller
        - --configmap=$(POD_NAMESPACE)/nginx-configuration
        - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
        - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443

YAML

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

YAML

Chapter 6: Storage

Persistent Volumes and Claims

graph LR
    PV[PersistentVolume] --> PVC[PersistentVolumeClaim]
    PVC --> POD[Pod]

    subgraph "Storage Classes"
        SC1[fast-ssd]
        SC2[slow-hdd]
        SC3[network-storage]
    end

    SC1 --> PV

PersistentVolume

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-database
spec:
  capacity:
    storage: 100Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: fast-ssd
  csi:
    driver: ebs.csi.aws.com
    volumeHandle: vol-0abcd1234efgh5678
    fsType: ext4

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-database
spec:
  capacity:
    storage: 100Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: fast-ssd
  csi:
    driver: ebs.csi.aws.com
    volumeHandle: vol-0abcd1234efgh5678
    fsType: ext4

YAML

PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-pvc
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 50Gi
  storageClassName: fast-ssd

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-pvc
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 50Gi
  storageClassName: fast-ssd

YAML

StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Delete

YAML

Volume Types

EmptyDir

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: cache-volume
      mountPath: /cache
  - name: sidecar
    image: busybox
    volumeMounts:
    - name: cache-volume
      mountPath: /shared
  volumes:
  - name: cache-volume
    emptyDir:
      sizeLimit: 1Gi

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: cache-volume
      mountPath: /cache
  - name: sidecar
    image: busybox
    volumeMounts:
    - name: cache-volume
      mountPath: /shared
  volumes:
  - name: cache-volume
    emptyDir:
      sizeLimit: 1Gi

YAML

HostPath

apiVersion: v1
kind: Pod
metadata:
  name: hostpath-pod
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: host-volume
      mountPath: /host-data
  volumes:
  - name: host-volume
    hostPath:
      path: /var/log
      type: Directory

apiVersion: v1
kind: Pod
metadata:
  name: hostpath-pod
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: host-volume
      mountPath: /host-data
  volumes:
  - name: host-volume
    hostPath:
      path: /var/log
      type: Directory

YAML

ConfigMap Volume

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  app.properties: |
    database.host=db.example.com
    database.port=5432
    cache.enabled=true

---
apiVersion: v1
kind: Pod
metadata:
  name: config-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: config-volume
      mountPath: /etc/config
  volumes:
  - name: config-volume
    configMap:
      name: app-config

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  app.properties: |
    database.host=db.example.com
    database.port=5432
    cache.enabled=true

---
apiVersion: v1
kind: Pod
metadata:
  name: config-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: config-volume
      mountPath: /etc/config
  volumes:
  - name: config-volume
    configMap:
      name: app-config

YAML

Secret Volume

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
  username: bXl1c2Vy  # base64 encoded
  password: bXlwYXNz  # base64 encoded

---
apiVersion: v1
kind: Pod
metadata:
  name: secret-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: secret-volume
      mountPath: /etc/secrets
      readOnly: true
  volumes:
  - name: secret-volume
    secret:
      secretName: db-secret

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
  username: bXl1c2Vy  # base64 encoded
  password: bXlwYXNz  # base64 encoded

---
apiVersion: v1
kind: Pod
metadata:
  name: secret-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: secret-volume
      mountPath: /etc/secrets
      readOnly: true
  volumes:
  - name: secret-volume
    secret:
      secretName: db-secret

YAML

Chapter 7: Configuration Management

ConfigMaps

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
data:
  redis.conf: |
    bind 0.0.0.0
    port 6379
    timeout 0
    save 900 1
    save 300 10
    save 60 10000
  max-memory: "2gb"
  max-connections: "1000"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:6.2
        env:
        - name: MAX_MEMORY
          valueFrom:
            configMapKeyRef:
              name: redis-config
              key: max-memory
        volumeMounts:
        - name: config
          mountPath: /usr/local/etc/redis
      volumes:
      - name: config
        configMap:
          name: redis-config
          items:
          - key: redis.conf
            path: redis.conf

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
data:
  redis.conf: |
    bind 0.0.0.0
    port 6379
    timeout 0
    save 900 1
    save 300 10
    save 60 10000
  max-memory: "2gb"
  max-connections: "1000"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:6.2
        env:
        - name: MAX_MEMORY
          valueFrom:
            configMapKeyRef:
              name: redis-config
              key: max-memory
        volumeMounts:
        - name: config
          mountPath: /usr/local/etc/redis
      volumes:
      - name: config
        configMap:
          name: redis-config
          items:
          - key: redis.conf
            path: redis.conf

YAML

Secrets

# Create secret from command line
# kubectl create secret generic db-secret \
#   --from-literal=username=dbuser \
#   --from-literal=password=secretpassword

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
stringData:  # No need to base64 encode
  username: dbuser
  password: secretpassword
  connection-string: "postgresql://dbuser:secretpassword@db:5432/mydb"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: myapp:latest
        env:
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: password
        envFrom:
        - secretRef:
            name: db-secret

# Create secret from command line
# kubectl create secret generic db-secret \
#   --from-literal=username=dbuser \
#   --from-literal=password=secretpassword

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
stringData:  # No need to base64 encode
  username: dbuser
  password: secretpassword
  connection-string: "postgresql://dbuser:secretpassword@db:5432/mydb"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: myapp:latest
        env:
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: password
        envFrom:
        - secretRef:
            name: db-secret

YAML

Managing Environment Variables

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-env
data:
  ENVIRONMENT: "production"
  LOG_LEVEL: "info"
  FEATURE_FLAGS: "new-ui,advanced-search"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: multi-env-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: multi-env-app
  template:
    metadata:
      labels:
        app: multi-env-app
    spec:
      containers:
      - name: app
        image: myapp:latest
        env:
        # Direct environment variable
        - name: APP_VERSION
          value: "1.2.3"
        # From field reference
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        # From ConfigMap
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: app-env
              key: LOG_LEVEL
        # From Secret
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secret
              key: key
        # All from ConfigMap
        envFrom:
        - configMapRef:
            name: app-env
        # All from Secret
        - secretRef:
            name: app-secret

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-env
data:
  ENVIRONMENT: "production"
  LOG_LEVEL: "info"
  FEATURE_FLAGS: "new-ui,advanced-search"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: multi-env-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: multi-env-app
  template:
    metadata:
      labels:
        app: multi-env-app
    spec:
      containers:
      - name: app
        image: myapp:latest
        env:
        # Direct environment variable
        - name: APP_VERSION
          value: "1.2.3"
        # From field reference
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        # From ConfigMap
        - name: LOG_LEVEL
          valueFrom:
            configMapKeyRef:
              name: app-env
              key: LOG_LEVEL
        # From Secret
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: api-secret
              key: key
        # All from ConfigMap
        envFrom:
        - configMapRef:
            name: app-env
        # All from Secret
        - secretRef:
            name: app-secret

YAML

Chapter 8: Security and RBAC

Role-Based Access Control (RBAC)

graph LR
    USER[User/ServiceAccount] --> RB[RoleBinding]
    RB --> ROLE[Role/ClusterRole]
    ROLE --> PERM[Permissions]

    subgraph "Scope"
        NS[Namespace Level]
        CL[Cluster Level]
    end

ServiceAccount

apiVersion: v1
kind: ServiceAccount
metadata:
  name: pod-reader
  namespace: default

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: ServiceAccount
  name: pod-reader
  namespace: default
roleRef:
  kind: Role
  name: pod-reader-role
  apiGroup: rbac.authorization.k8s.io

apiVersion: v1
kind: ServiceAccount
metadata:
  name: pod-reader
  namespace: default

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: pod-reader-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-pods
  namespace: default
subjects:
- kind: ServiceAccount
  name: pod-reader
  namespace: default
roleRef:
  kind: Role
  name: pod-reader-role
  apiGroup: rbac.authorization.k8s.io

YAML

ClusterRole and ClusterRoleBinding

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-admin-role
rules:
- apiGroups: [""]
  resources: ["*"]
  verbs: ["*"]
- apiGroups: ["apps"]
  resources: ["*"]
  verbs: ["*"]
- apiGroups: ["networking.k8s.io"]
  resources: ["*"]
  verbs: ["*"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-admin-binding
subjects:
- kind: User
  name: admin@company.com
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-admin-role
  apiGroup: rbac.authorization.k8s.io

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-admin-role
rules:
- apiGroups: [""]
  resources: ["*"]
  verbs: ["*"]
- apiGroups: ["apps"]
  resources: ["*"]
  verbs: ["*"]
- apiGroups: ["networking.k8s.io"]
  resources: ["*"]
  verbs: ["*"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-admin-binding
subjects:
- kind: User
  name: admin@company.com
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: cluster-admin-role
  apiGroup: rbac.authorization.k8s.io

YAML

Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
  name: secure-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
  namespace: secure-namespace
spec:
  replicas: 3
  selector:
    matchLabels:
      app: secure-app
  template:
    metadata:
      labels:
        app: secure-app
    spec:
      serviceAccountName: secure-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 3000
        fsGroup: 2000
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: app
        image: myapp:latest
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          capabilities:
            drop:
            - ALL
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir: {}

apiVersion: v1
kind: Namespace
metadata:
  name: secure-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
  namespace: secure-namespace
spec:
  replicas: 3
  selector:
    matchLabels:
      app: secure-app
  template:
    metadata:
      labels:
        app: secure-app
    spec:
      serviceAccountName: secure-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 3000
        fsGroup: 2000
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: app
        image: myapp:latest
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          capabilities:
            drop:
            - ALL
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir: {}

YAML

Network Policies

# Deny all traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
# Allow frontend to backend
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: frontend
    - namespaceSelector:
        matchLabels:
          name: frontend-namespace
    ports:
    - protocol: TCP
      port: 8080

---
# Allow egress to database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: backend
  policyTypes:
  - Egress
  egress:
  - to:
    - podSelector:
        matchLabels:
          tier: database
    ports:
    - protocol: TCP
      port: 5432
  - to: []  # Allow DNS
    ports:
    - protocol: UDP
      port: 53

# Deny all traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
# Allow frontend to backend
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: frontend
    - namespaceSelector:
        matchLabels:
          name: frontend-namespace
    ports:
    - protocol: TCP
      port: 8080

---
# Allow egress to database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: backend
  policyTypes:
  - Egress
  egress:
  - to:
    - podSelector:
        matchLabels:
          tier: database
    ports:
    - protocol: TCP
      port: 5432
  - to: []  # Allow DNS
    ports:
    - protocol: UDP
      port: 53

YAML

Resource Quotas and Limits

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    persistentvolumeclaims: "10"
    pods: "20"
    services: "5"
    secrets: "10"
    configmaps: "10"

---
apiVersion: v1
kind: LimitRange
metadata:
  name: limit-range
  namespace: production
spec:
  limits:
  - default:
      cpu: "1"
      memory: "1Gi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "2"
      memory: "4Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
    type: Container

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    persistentvolumeclaims: "10"
    pods: "20"
    services: "5"
    secrets: "10"
    configmaps: "10"

---
apiVersion: v1
kind: LimitRange
metadata:
  name: limit-range
  namespace: production
spec:
  limits:
  - default:
      cpu: "1"
      memory: "1Gi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "2"
      memory: "4Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
    type: Container

YAML

Chapter 9: Monitoring and Logging

Prometheus and Grafana Stack

graph TB
    subgraph "Monitoring Stack"
        PROM[Prometheus Server]
        GRAF[Grafana]
        AM[AlertManager]

        subgraph "Exporters"
            NE[Node Exporter]
            CE[cAdvisor]
            KSM[Kube State Metrics]
        end

        subgraph "Applications"
            APP1[App 1]
            APP2[App 2]
        end
    end

    NE --> PROM
    CE --> PROM
    KSM --> PROM
    APP1 --> PROM
    APP2 --> PROM
    PROM --> GRAF
    PROM --> AM

Prometheus Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s

    scrape_configs:
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https

    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
        - '--config.file=/etc/prometheus/prometheus.yml'
        - '--storage.tsdb.path=/prometheus/'
        - '--web.console.libraries=/etc/prometheus/console_libraries'
        - '--web.console.templates=/etc/prometheus/consoles'
        - '--storage.tsdb.retention.time=200h'
        - '--web.enable-lifecycle'
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: prometheus-config-volume
          mountPath: /etc/prometheus/
        - name: prometheus-storage-volume
          mountPath: /prometheus/
      volumes:
      - name: prometheus-config-volume
        configMap:
          name: prometheus-config
      - name: prometheus-storage-volume
        emptyDir: {}

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s

    scrape_configs:
    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https

    - job_name: 'kubernetes-nodes'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
        - '--config.file=/etc/prometheus/prometheus.yml'
        - '--storage.tsdb.path=/prometheus/'
        - '--web.console.libraries=/etc/prometheus/console_libraries'
        - '--web.console.templates=/etc/prometheus/consoles'
        - '--storage.tsdb.retention.time=200h'
        - '--web.enable-lifecycle'
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: prometheus-config-volume
          mountPath: /etc/prometheus/
        - name: prometheus-storage-volume
          mountPath: /prometheus/
      volumes:
      - name: prometheus-config-volume
        configMap:
          name: prometheus-config
      - name: prometheus-storage-volume
        emptyDir: {}

YAML

Grafana Dashboard

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
data:
  prometheus.yaml: |
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-service:9090
      access: proxy
      isDefault: true

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:latest
        ports:
        - containerPort: 3000
        env:
        - name: GF_SECURITY_ADMIN_PASSWORD
          valueFrom:
            secretKeyRef:
              name: grafana-secret
              key: admin-password
        volumeMounts:
        - name: grafana-datasources
          mountPath: /etc/grafana/provisioning/datasources
      volumes:
      - name: grafana-datasources
        configMap:
          name: grafana-datasources

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
data:
  prometheus.yaml: |
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-service:9090
      access: proxy
      isDefault: true

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:latest
        ports:
        - containerPort: 3000
        env:
        - name: GF_SECURITY_ADMIN_PASSWORD
          valueFrom:
            secretKeyRef:
              name: grafana-secret
              key: admin-password
        volumeMounts:
        - name: grafana-datasources
          mountPath: /etc/grafana/provisioning/datasources
      volumes:
      - name: grafana-datasources
        configMap:
          name: grafana-datasources

YAML

Application Metrics

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: metrics-app
  template:
    metadata:
      labels:
        app: metrics-app
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 8080
        - containerPort: 9090  # Metrics port
        env:
        - name: METRICS_ENABLED
          value: "true"

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: metrics-app
  template:
    metadata:
      labels:
        app: metrics-app
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 8080
        - containerPort: 9090  # Metrics port
        env:
        - name: METRICS_ENABLED
          value: "true"

YAML

EFK Stack (Elasticsearch, Fluentd, Kibana)

graph LR
    APPS[Applications] --> FD[Fluentd DaemonSet]
    FD --> ES[Elasticsearch]
    ES --> KB[Kibana]
    KB --> USER[Users]

Fluentd Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag kubernetes.*
      read_from_head true
      <parse>
        @type json
        time_key time
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <filter kubernetes.**>
      @type kubernetes_metadata
    </filter>

    <match **>
      @type elasticsearch
      host elasticsearch-service
      port 9200
      logstash_format true
      logstash_prefix kubernetes
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size 2M
        queue_limit_length 8
        overflow_action block
      </buffer>
    </match>

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      serviceAccountName: fluentd
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
        env:
        - name: FLUENTD_SYSTEMD_CONF
          value: "disable"
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluentd-config
          mountPath: /fluentd/etc
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluentd-config
        configMap:
          name: fluentd-config

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag kubernetes.*
      read_from_head true
      <parse>
        @type json
        time_key time
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <filter kubernetes.**>
      @type kubernetes_metadata
    </filter>

    <match **>
      @type elasticsearch
      host elasticsearch-service
      port 9200
      logstash_format true
      logstash_prefix kubernetes
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size 2M
        queue_limit_length 8
        overflow_action block
      </buffer>
    </match>

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      serviceAccountName: fluentd
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
        env:
        - name: FLUENTD_SYSTEMD_CONF
          value: "disable"
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluentd-config
          mountPath: /fluentd/etc
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluentd-config
        configMap:
          name: fluentd-config

YAML

Chapter 10: Troubleshooting

Common Issues and Solutions

Pod Troubleshooting

# Check pod status
kubectl get pods
kubectl describe pod <pod-name>

# Check logs
kubectl logs <pod-name>
kubectl logs <pod-name> --previous  # Previous container instance
kubectl logs <pod-name> -c <container-name>  # Multi-container pod

# Execute into pod
kubectl exec -it <pod-name> -- /bin/bash

# Port forwarding for testing
kubectl port-forward <pod-name> 8080:80

# Check events
kubectl get events --sort-by=.metadata.creationTimestamp

# Check pod status
kubectl get pods
kubectl describe pod <pod-name>

# Check logs
kubectl logs <pod-name>
kubectl logs <pod-name> --previous  # Previous container instance
kubectl logs <pod-name> -c <container-name>  # Multi-container pod

# Execute into pod
kubectl exec -it <pod-name> -- /bin/bash

# Port forwarding for testing
kubectl port-forward <pod-name> 8080:80

# Check events
kubectl get events --sort-by=.metadata.creationTimestamp

Bash

Common Pod States and Solutions

graph TD
    PENDING[Pending] --> |Check Resources| SCHEDULED[Scheduled]
    PENDING --> |Check Node Selector| NODE_ISSUE[Node Selection Issue]
    RUNNING[Running] --> |Check Logs| APP_ERROR[Application Error]
    FAILED[Failed] --> |Check Exit Code| RESTART[Restart Policy]
    CRASHLOOPBACKOFF[CrashLoopBackOff] --> |Fix App Logic| RUNNING

Debugging Examples

# Debug pod for troubleshooting
apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
spec:
  containers:
  - name: debug
    image: nicolaka/netshoot
    command: ["/bin/bash"]
    args: ["-c", "while true; do sleep 30; done;"]
    securityContext:
      capabilities:
        add: ["NET_ADMIN"]
  hostNetwork: true
  hostPID: true

# Debug pod for troubleshooting
apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
spec:
  containers:
  - name: debug
    image: nicolaka/netshoot
    command: ["/bin/bash"]
    args: ["-c", "while true; do sleep 30; done;"]
    securityContext:
      capabilities:
        add: ["NET_ADMIN"]
  hostNetwork: true
  hostPID: true

YAML

Network Troubleshooting

# Test DNS resolution
kubectl exec -it debug-pod -- nslookup kubernetes.default.svc.cluster.local

# Test connectivity
kubectl exec -it debug-pod -- curl -v http://service-name:port

# Check network policies
kubectl get networkpolicies
kubectl describe networkpolicy <policy-name>

# Test pod-to-pod communication
kubectl exec -it pod1 -- ping <pod2-ip>

# Test DNS resolution
kubectl exec -it debug-pod -- nslookup kubernetes.default.svc.cluster.local

# Test connectivity
kubectl exec -it debug-pod -- curl -v http://service-name:port

# Check network policies
kubectl get networkpolicies
kubectl describe networkpolicy <policy-name>

# Test pod-to-pod communication
kubectl exec -it pod1 -- ping <pod2-ip>

Bash

Resource Issues

# Check node resources
kubectl top nodes
kubectl describe nodes

# Check pod resource usage
kubectl top pods
kubectl top pods --containers

# Check resource quotas
kubectl get resourcequota
kubectl describe resourcequota

# Check limit ranges
kubectl get limitrange
kubectl describe limitrange

# Check node resources
kubectl top nodes
kubectl describe nodes

# Check pod resource usage
kubectl top pods
kubectl top pods --containers

# Check resource quotas
kubectl get resourcequota
kubectl describe resourcequota

# Check limit ranges
kubectl get limitrange
kubectl describe limitrange

Bash

Monitoring Cluster Health

apiVersion: v1
kind: Pod
metadata:
  name: cluster-health-check
spec:
  containers:
  - name: health-check
    image: curlimages/curl
    command: ["/bin/sh"]
    args:
    - -c
    - |
      while true; do
        echo "=== Cluster Health Check ==="

        # Check API server
        if curl -k https://kubernetes.default.svc/healthz; then
          echo "API Server: OK"
        else
          echo "API Server: FAILED"
        fi

        # Check CoreDNS
        if nslookup kubernetes.default.svc.cluster.local; then
          echo "DNS: OK"
        else
          echo "DNS: FAILED"
        fi

        sleep 60
      done

apiVersion: v1
kind: Pod
metadata:
  name: cluster-health-check
spec:
  containers:
  - name: health-check
    image: curlimages/curl
    command: ["/bin/sh"]
    args:
    - -c
    - |
      while true; do
        echo "=== Cluster Health Check ==="

        # Check API server
        if curl -k https://kubernetes.default.svc/healthz; then
          echo "API Server: OK"
        else
          echo "API Server: FAILED"
        fi

        # Check CoreDNS
        if nslookup kubernetes.default.svc.cluster.local; then
          echo "DNS: OK"
        else
          echo "DNS: FAILED"
        fi

        sleep 60
      done

YAML

Chapter 11: Web Applications

Deploying a Complete Web Application

graph TB
    subgraph "Frontend Tier"
        FE[React Frontend]
        ING[Ingress]
    end

    subgraph "Backend Tier"
        BE[Node.js API]
        SVC[Service]
    end

    subgraph "Database Tier"
        DB[PostgreSQL]
        PVC[Persistent Volume]
    end

    FE --> ING
    ING --> SVC
    SVC --> BE
    BE --> DB
    DB --> PVC

Database Layer

apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
type: Opaque
stringData:
  POSTGRES_DB: webapp
  POSTGRES_USER: appuser
  POSTGRES_PASSWORD: secretpassword

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:13
        ports:
        - containerPort: 5432
        envFrom:
        - secretRef:
            name: postgres-secret
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - appuser
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - appuser
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

---
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432

apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
type: Opaque
stringData:
  POSTGRES_DB: webapp
  POSTGRES_USER: appuser
  POSTGRES_PASSWORD: secretpassword

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:13
        ports:
        - containerPort: 5432
        envFrom:
        - secretRef:
            name: postgres-secret
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - appuser
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - appuser
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

---
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
spec:
  selector:
    app: postgres
  ports:
  - port: 5432
    targetPort: 5432

YAML

Backend API

apiVersion: v1
kind: ConfigMap
metadata:
  name: backend-config
data:
  NODE_ENV: "production"
  PORT: "3000"
  LOG_LEVEL: "info"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend-api
  template:
    metadata:
      labels:
        app: backend-api
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "3000"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: api
        image: mycompany/backend-api:v1.2.3
        ports:
        - containerPort: 3000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: database-url
        envFrom:
        - configMapRef:
            name: backend-config
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
        volumeMounts:
        - name: logs
          mountPath: /app/logs
      volumes:
      - name: logs
        emptyDir: {}

---
apiVersion: v1
kind: Service
metadata:
  name: backend-service
spec:
  selector:
    app: backend-api
  ports:
  - port: 80
    targetPort: 3000
  type: ClusterIP

apiVersion: v1
kind: ConfigMap
metadata:
  name: backend-config
data:
  NODE_ENV: "production"
  PORT: "3000"
  LOG_LEVEL: "info"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend-api
  template:
    metadata:
      labels:
        app: backend-api
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "3000"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: api
        image: mycompany/backend-api:v1.2.3
        ports:
        - containerPort: 3000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: database-url
        envFrom:
        - configMapRef:
            name: backend-config
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
        volumeMounts:
        - name: logs
          mountPath: /app/logs
      volumes:
      - name: logs
        emptyDir: {}

---
apiVersion: v1
kind: Service
metadata:
  name: backend-service
spec:
  selector:
    app: backend-api
  ports:
  - port: 80
    targetPort: 3000
  type: ClusterIP

YAML

Frontend Application

apiVersion: v1
kind: ConfigMap
metadata:
  name: frontend-config
data:
  nginx.conf: |
    server {
        listen 80;
        server_name localhost;
        root /usr/share/nginx/html;
        index index.html;

        # Gzip compression
        gzip on;
        gzip_types text/plain text/css application/json application/javascript text/xml application/xml;

        # Security headers
        add_header X-Content-Type-Options nosniff;
        add_header X-Frame-Options DENY;
        add_header X-XSS-Protection "1; mode=block";

        # API proxy
        location /api {
            proxy_pass http://backend-service;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }

        # React Router support
        location / {
            try_files $uri $uri/ /index.html;
        }
    }

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: nginx
        image: mycompany/frontend:v1.2.3
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nginx-config
          mountPath: /etc/nginx/conf.d
        resources:
          requests:
            memory: "64Mi"
            cpu: "50m"
          limits:
            memory: "128Mi"
            cpu: "100m"
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: nginx-config
        configMap:
          name: frontend-config

---
apiVersion: v1
kind: Service
metadata:
  name: frontend-service
spec:
  selector:
    app: frontend
  ports:
  - port: 80
    targetPort: 80

apiVersion: v1
kind: ConfigMap
metadata:
  name: frontend-config
data:
  nginx.conf: |
    server {
        listen 80;
        server_name localhost;
        root /usr/share/nginx/html;
        index index.html;

        # Gzip compression
        gzip on;
        gzip_types text/plain text/css application/json application/javascript text/xml application/xml;

        # Security headers
        add_header X-Content-Type-Options nosniff;
        add_header X-Frame-Options DENY;
        add_header X-XSS-Protection "1; mode=block";

        # API proxy
        location /api {
            proxy_pass http://backend-service;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }

        # React Router support
        location / {
            try_files $uri $uri/ /index.html;
        }
    }

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: nginx
        image: mycompany/frontend:v1.2.3
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nginx-config
          mountPath: /etc/nginx/conf.d
        resources:
          requests:
            memory: "64Mi"
            cpu: "50m"
          limits:
            memory: "128Mi"
            cpu: "100m"
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: nginx-config
        configMap:
          name: frontend-config

---
apiVersion: v1
kind: Service
metadata:
  name: frontend-service
spec:
  selector:
    app: frontend
  ports:
  - port: 80
    targetPort: 80

YAML

Ingress Configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: webapp-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/cors-allow-origin: "https://myapp.com"
spec:
  tls:
  - hosts:
    - myapp.com
    - api.myapp.com
    secretName: webapp-tls
  rules:
  - host: myapp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80
  - host: api.myapp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: backend-service
            port:
              number: 80

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: webapp-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/cors-allow-origin: "https://myapp.com"
spec:
  tls:
  - hosts:
    - myapp.com
    - api.myapp.com
    secretName: webapp-tls
  rules:
  - host: myapp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80
  - host: api.myapp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: backend-service
            port:
              number: 80

YAML

WordPress Example

apiVersion: v1
kind: Secret
metadata:
  name: mysql-secret
type: Opaque
stringData:
  MYSQL_ROOT_PASSWORD: rootpassword
  MYSQL_DATABASE: wordpress
  MYSQL_USER: wpuser
  MYSQL_PASSWORD: wppassword

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        envFrom:
        - secretRef:
            name: mysql-secret
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-data
        persistentVolumeClaim:
          claimName: mysql-pvc

---
apiVersion: v1
kind: Service
metadata:
  name: mysql-service
spec:
  selector:
    app: mysql
  ports:
  - port: 3306

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
spec:
  replicas: 2
  selector:
    matchLabels:
      app: wordpress
  template:
    metadata:
      labels:
        app: wordpress
    spec:
      containers:
      - name: wordpress
        image: wordpress:latest
        env:
        - name: WORDPRESS_DB_HOST
          value: mysql-service
        - name: WORDPRESS_DB_NAME
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: MYSQL_DATABASE
        - name: WORDPRESS_DB_USER
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: MYSQL_USER
        - name: WORDPRESS_DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: MYSQL_PASSWORD
        ports:
        - containerPort: 80
        volumeMounts:
        - name: wordpress-data
          mountPath: /var/www/html
      volumes:
      - name: wordpress-data
        persistentVolumeClaim:
          claimName: wordpress-pvc

apiVersion: v1
kind: Secret
metadata:
  name: mysql-secret
type: Opaque
stringData:
  MYSQL_ROOT_PASSWORD: rootpassword
  MYSQL_DATABASE: wordpress
  MYSQL_USER: wpuser
  MYSQL_PASSWORD: wppassword

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        envFrom:
        - secretRef:
            name: mysql-secret
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-data
        persistentVolumeClaim:
          claimName: mysql-pvc

---
apiVersion: v1
kind: Service
metadata:
  name: mysql-service
spec:
  selector:
    app: mysql
  ports:
  - port: 3306

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
spec:
  replicas: 2
  selector:
    matchLabels:
      app: wordpress
  template:
    metadata:
      labels:
        app: wordpress
    spec:
      containers:
      - name: wordpress
        image: wordpress:latest
        env:
        - name: WORDPRESS_DB_HOST
          value: mysql-service
        - name: WORDPRESS_DB_NAME
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: MYSQL_DATABASE
        - name: WORDPRESS_DB_USER
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: MYSQL_USER
        - name: WORDPRESS_DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: MYSQL_PASSWORD
        ports:
        - containerPort: 80
        volumeMounts:
        - name: wordpress-data
          mountPath: /var/www/html
      volumes:
      - name: wordpress-data
        persistentVolumeClaim:
          claimName: wordpress-pvc

YAML

Chapter 12: Advanced Topics

Horizontal Pod Autoscaling (HPA)

graph LR
    HPA[HPA Controller] --> METRICS[Metrics Server]
    METRICS --> PODS[Pod Metrics]
    HPA --> DEPLOY[Deployment]
    DEPLOY --> SCALE[Scale Pods]

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: custom_metric
      target:
        type: AverageValue
        averageValue: "100"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: custom_metric
      target:
        type: AverageValue
        averageValue: "100"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

YAML

Vertical Pod Autoscaling (VPA)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: webapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]

YAML

Cluster Autoscaling

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
        env:
        - name: AWS_REGION
          value: us-west-2

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
        env:
        - name: AWS_REGION
          value: us-west-2

YAML

Custom Resources and Operators

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: webapps.example.com
spec:
  group: example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              image:
                type: string
              replicas:
                type: integer
                minimum: 1
                maximum: 10
              port:
                type: integer
          status:
            type: object
            properties:
              availableReplicas:
                type: integer
  scope: Namespaced
  names:
    plural: webapps
    singular: webapp
    kind: WebApp
    shortNames:
    - wa

---
apiVersion: example.com/v1
kind: WebApp
metadata:
  name: my-webapp
spec:
  image: nginx:1.21
  replicas: 3
  port: 80

---
# Simple Operator example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: webapp-operator
  template:
    metadata:
      labels:
        name: webapp-operator
    spec:
      containers:
      - name: webapp-operator
        image: mycompany/webapp-operator:latest
        env:
        - name: WATCH_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: OPERATOR_NAME
          value: "webapp-operator"

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: webapps.example.com
spec:
  group: example.com
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              image:
                type: string
              replicas:
                type: integer
                minimum: 1
                maximum: 10
              port:
                type: integer
          status:
            type: object
            properties:
              availableReplicas:
                type: integer
  scope: Namespaced
  names:
    plural: webapps
    singular: webapp
    kind: WebApp
    shortNames:
    - wa

---
apiVersion: example.com/v1
kind: WebApp
metadata:
  name: my-webapp
spec:
  image: nginx:1.21
  replicas: 3
  port: 80

---
# Simple Operator example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: webapp-operator
  template:
    metadata:
      labels:
        name: webapp-operator
    spec:
      containers:
      - name: webapp-operator
        image: mycompany/webapp-operator:latest
        env:
        - name: WATCH_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: OPERATOR_NAME
          value: "webapp-operator"

YAML

GitOps with ArgoCD

graph LR
    DEV[Developer] --> GIT[Git Repository]
    GIT --> ARGO[ArgoCD]
    ARGO --> K8S[Kubernetes Cluster]
    ARGO --> |Sync Status| GIT
    ARGO --> |Deploy| K8S

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: webapp-production
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/mycompany/k8s-manifests
    targetRevision: HEAD
    path: production/webapp
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    - PruneLast=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: webapp-production
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/mycompany/k8s-manifests
    targetRevision: HEAD
    path: production/webapp
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    - PruneLast=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

YAML

Service Mesh with Istio

graph TB
    subgraph "Service Mesh"
        subgraph "Data Plane"
            POD1[Pod + Sidecar Proxy]
            POD2[Pod + Sidecar Proxy]
            POD3[Pod + Sidecar Proxy]
        end

        subgraph "Control Plane"
            PILOT[Pilot]
            CITADEL[Citadel]
            GALLEY[Galley]
        end
    end

    PILOT --> POD1
    PILOT --> POD2
    PILOT --> POD3
    CITADEL --> POD1
    CITADEL --> POD2
    CITADEL --> POD3

Istio Gateway and VirtualService

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: webapp-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - myapp.example.com
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: webapp-tls
    hosts:
    - myapp.example.com

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: webapp-vs
spec:
  hosts:
  - myapp.example.com
  gateways:
  - webapp-gateway
  http:
  - match:
    - uri:
        prefix: "/api/v1"
    route:
    - destination:
        host: backend-service
        port:
          number: 80
      weight: 90
    - destination:
        host: backend-service-canary
        port:
          number: 80
      weight: 10
  - match:
    - uri:
        prefix: "/"
    route:
    - destination:
        host: frontend-service
        port:
          number: 80

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: webapp-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - myapp.example.com
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: webapp-tls
    hosts:
    - myapp.example.com

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: webapp-vs
spec:
  hosts:
  - myapp.example.com
  gateways:
  - webapp-gateway
  http:
  - match:
    - uri:
        prefix: "/api/v1"
    route:
    - destination:
        host: backend-service
        port:
          number: 80
      weight: 90
    - destination:
        host: backend-service-canary
        port:
          number: 80
      weight: 10
  - match:
    - uri:
        prefix: "/"
    route:
    - destination:
        host: frontend-service
        port:
          number: 80

YAML

Traffic Splitting and Canary Deployments

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: backend-destination
spec:
  host: backend-service
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  trafficPolicy:
    loadBalancer:
      simple: LEAST_CONN
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        maxRequestsPerConnection: 5
    circuitBreaker:
      consecutiveErrors: 3
      interval: 30s
      baseEjectionTime: 30s

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: backend-canary
spec:
  hosts:
  - backend-service
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: backend-service
        subset: v2
  - route:
    - destination:
        host: backend-service
        subset: v1
      weight: 95
    - destination:
        host: backend-service
        subset: v2
      weight: 5

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: backend-destination
spec:
  host: backend-service
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
  trafficPolicy:
    loadBalancer:
      simple: LEAST_CONN
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        maxRequestsPerConnection: 5
    circuitBreaker:
      consecutiveErrors: 3
      interval: 30s
      baseEjectionTime: 30s

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: backend-canary
spec:
  hosts:
  - backend-service
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: backend-service
        subset: v2
  - route:
    - destination:
        host: backend-service
        subset: v1
      weight: 95
    - destination:
        host: backend-service
        subset: v2
      weight: 5

YAML

Multi-Cluster Management

graph TB
    subgraph "Management Cluster"
        MC[Control Plane]
        ARGO[ArgoCD]
        FLUX[Flux]
    end

    subgraph "Production Cluster"
        PC[Workloads]
    end

    subgraph "Staging Cluster"
        SC[Workloads]
    end

    subgraph "Development Cluster"
        DC[Workloads]
    end

    MC --> PC
    MC --> SC
    MC --> DC
    ARGO --> PC
    ARGO --> SC
    FLUX --> DC

Cluster API Example

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-cluster
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: production-cluster
  controlPlaneRef:
    kind: KubeadmControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: production-cluster-control-plane

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  name: production-cluster
spec:
  region: us-west-2
  sshKeyName: my-ssh-key

---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: production-cluster-control-plane
spec:
  replicas: 3
  machineTemplate:
    infrastructureRef:
      kind: AWSMachineTemplate
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      name: production-cluster-control-plane
  kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: aws
      controllerManager:
        extraArgs:
          cloud-provider: aws

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-cluster
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: production-cluster
  controlPlaneRef:
    kind: KubeadmControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: production-cluster-control-plane

---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  name: production-cluster
spec:
  region: us-west-2
  sshKeyName: my-ssh-key

---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: production-cluster-control-plane
spec:
  replicas: 3
  machineTemplate:
    infrastructureRef:
      kind: AWSMachineTemplate
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      name: production-cluster-control-plane
  kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          cloud-provider: aws
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-provider: aws
      controllerManager:
        extraArgs:
          cloud-provider: aws

YAML

Disaster Recovery and Backup

# Velero backup configuration
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: daily-backup
  namespace: velero
spec:
  includedNamespaces:
  - production
  - staging
  excludedResources:
  - events
  - events.events.k8s.io
  storageLocation: aws-s3
  ttl: 720h0m0s  # 30 days
  snapshotVolumes: true

---
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup-schedule
  namespace: velero
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  template:
    includedNamespaces:
    - production
    - staging
    excludedResources:
    - events
    - events.events.k8s.io
    storageLocation: aws-s3
    ttl: 720h0m0s
    snapshotVolumes: true

---
# Restore example
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: production-restore
  namespace: velero
spec:
  backupName: daily-backup-20250820
  includedNamespaces:
  - production
  restorePVs: true

# Velero backup configuration
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: daily-backup
  namespace: velero
spec:
  includedNamespaces:
  - production
  - staging
  excludedResources:
  - events
  - events.events.k8s.io
  storageLocation: aws-s3
  ttl: 720h0m0s  # 30 days
  snapshotVolumes: true

---
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup-schedule
  namespace: velero
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  template:
    includedNamespaces:
    - production
    - staging
    excludedResources:
    - events
    - events.events.k8s.io
    storageLocation: aws-s3
    ttl: 720h0m0s
    snapshotVolumes: true

---
# Restore example
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: production-restore
  namespace: velero
spec:
  backupName: daily-backup-20250820
  includedNamespaces:
  - production
  restorePVs: true

YAML

Performance Optimization

Pod Disruption Budget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: webapp-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: webapp

---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: critical-app-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      tier: critical

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: webapp-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: webapp

---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: critical-app-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      tier: critical

YAML

Priority Classes

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000
globalDefault: false
description: "High priority class for critical applications"

---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 100
globalDefault: false
description: "Low priority class for batch jobs"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: critical-app
  template:
    metadata:
      labels:
        app: critical-app
    spec:
      priorityClassName: high-priority
      containers:
      - name: app
        image: critical-app:latest
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000
globalDefault: false
description: "High priority class for critical applications"

---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 100
globalDefault: false
description: "Low priority class for batch jobs"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: critical-app
  template:
    metadata:
      labels:
        app: critical-app
    spec:
      priorityClassName: high-priority
      containers:
      - name: app
        image: critical-app:latest
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
          limits:
            memory: "512Mi"
            cpu: "1000m"

YAML

Node Affinity and Anti-Affinity

apiVersion: apps/v1
kind: Deployment
metadata:
  name: database-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: database
  template:
    metadata:
      labels:
        app: database
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-type
                operator: In
                values:
                - high-memory
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: zone
                operator: In
                values:
                - us-west-2a
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - database
            topologyKey: kubernetes.io/hostname
      containers:
      - name: database
        image: postgres:13
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"

apiVersion: apps/v1
kind: Deployment
metadata:
  name: database-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: database
  template:
    metadata:
      labels:
        app: database
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-type
                operator: In
                values:
                - high-memory
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: zone
                operator: In
                values:
                - us-west-2a
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - database
            topologyKey: kubernetes.io/hostname
      containers:
      - name: database
        image: postgres:13
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"

YAML

Cost Optimization

Resource Recommendations

# Install kube-resource-recommender
kubectl apply -f https://github.com/robusta-dev/kubernetes-resource-recommender/releases/latest/download/install.yaml

# Get recommendations
kubectl get resourcerecommendations

# Example recommendation output
kubectl describe resourcerecommendation webapp-deployment

# Install kube-resource-recommender
kubectl apply -f https://github.com/robusta-dev/kubernetes-resource-recommender/releases/latest/download/install.yaml

# Get recommendations
kubectl get resourcerecommendations

# Example recommendation output
kubectl describe resourcerecommendation webapp-deployment

Bash

Cluster Cost Analysis

apiVersion: v1
kind: ConfigMap
metadata:
  name: kubecost-config
data:
  kubecostProductConfigs.json: |
    {
      "currencyCode": "USD",
      "discount": "30",
      "negotiatedDiscount": "10",
      "defaultIdle": "false",
      "serviceKeyName": "service",
      "departmentKeyName": "department",
      "teamKeyName": "team",
      "envKeyName": "env"
    }

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubecost
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kubecost
  template:
    metadata:
      labels:
        app: kubecost
    spec:
      containers:
      - name: cost-analyzer
        image: gcr.io/kubecost1/cost-analyzer:latest
        ports:
        - containerPort: 9090
        env:
        - name: PROMETHEUS_SERVER_ENDPOINT
          value: "http://prometheus-service:9090"
        volumeMounts:
        - name: config
          mountPath: /var/configs
      volumes:
      - name: config
        configMap:
          name: kubecost-config

apiVersion: v1
kind: ConfigMap
metadata:
  name: kubecost-config
data:
  kubecostProductConfigs.json: |
    {
      "currencyCode": "USD",
      "discount": "30",
      "negotiatedDiscount": "10",
      "defaultIdle": "false",
      "serviceKeyName": "service",
      "departmentKeyName": "department",
      "teamKeyName": "team",
      "envKeyName": "env"
    }

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubecost
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kubecost
  template:
    metadata:
      labels:
        app: kubecost
    spec:
      containers:
      - name: cost-analyzer
        image: gcr.io/kubecost1/cost-analyzer:latest
        ports:
        - containerPort: 9090
        env:
        - name: PROMETHEUS_SERVER_ENDPOINT
          value: "http://prometheus-service:9090"
        volumeMounts:
        - name: config
          mountPath: /var/configs
      volumes:
      - name: config
        configMap:
          name: kubecost-config

YAML

Chapter 13: Production Best Practices

CI/CD Pipeline Integration

graph LR
    DEV[Developer] --> GIT[Git Repository]
    GIT --> CI[CI Pipeline]
    CI --> BUILD[Build Image]
    BUILD --> TEST[Run Tests]
    TEST --> SCAN[Security Scan]
    SCAN --> PUSH[Push to Registry]
    PUSH --> CD[CD Pipeline]
    CD --> DEPLOY[Deploy to K8s]

GitHub Actions Example

# .github/workflows/deploy.yml
name: Deploy to Kubernetes

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
    - name: Checkout repository
      uses: actions/checkout@v3

    - name: Setup Docker Buildx
      uses: docker/setup-buildx-action@v2

    - name: Log in to Container Registry
      uses: docker/login-action@v2
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}

    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v4
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=sha,prefix={{branch}}-

    - name: Build and push Docker image
      uses: docker/build-push-action@v4
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

    - name: Configure kubectl
      uses: azure/k8s-set-context@v3
      with:
        method: kubeconfig
        kubeconfig: ${{ secrets.KUBE_CONFIG }}

    - name: Deploy to Kubernetes
      run: |
        # Update image tag in deployment
        sed -i "s|IMAGE_TAG|${{ steps.meta.outputs.tags }}|g" k8s/deployment.yaml

        # Apply manifests
        kubectl apply -f k8s/

        # Wait for rollout
        kubectl rollout status deployment/webapp

        # Verify deployment
        kubectl get pods -l app=webapp

# .github/workflows/deploy.yml
name: Deploy to Kubernetes

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
    - name: Checkout repository
      uses: actions/checkout@v3

    - name: Setup Docker Buildx
      uses: docker/setup-buildx-action@v2

    - name: Log in to Container Registry
      uses: docker/login-action@v2
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}

    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v4
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=sha,prefix={{branch}}-

    - name: Build and push Docker image
      uses: docker/build-push-action@v4
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

    - name: Configure kubectl
      uses: azure/k8s-set-context@v3
      with:
        method: kubeconfig
        kubeconfig: ${{ secrets.KUBE_CONFIG }}

    - name: Deploy to Kubernetes
      run: |
        # Update image tag in deployment
        sed -i "s|IMAGE_TAG|${{ steps.meta.outputs.tags }}|g" k8s/deployment.yaml

        # Apply manifests
        kubectl apply -f k8s/

        # Wait for rollout
        kubectl rollout status deployment/webapp

        # Verify deployment
        kubectl get pods -l app=webapp

YAML

Security Hardening

Pod Security Context

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-webapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: secure-webapp
  template:
    metadata:
      labels:
        app: secure-webapp
    spec:
      serviceAccountName: webapp-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        runAsGroup: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
        supplementalGroups: [10001]
      containers:
      - name: webapp
        image: myapp:latest
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 10001
          capabilities:
            drop:
            - ALL
            add:
            - NET_BIND_SERVICE
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
        - name: logs
          mountPath: /app/logs
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir: {}
      - name: logs
        emptyDir: {}

apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-webapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: secure-webapp
  template:
    metadata:
      labels:
        app: secure-webapp
    spec:
      serviceAccountName: webapp-sa
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        runAsGroup: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault
        supplementalGroups: [10001]
      containers:
      - name: webapp
        image: myapp:latest
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 10001
          capabilities:
            drop:
            - ALL
            add:
            - NET_BIND_SERVICE
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
        - name: logs
          mountPath: /app/logs
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir: {}
      - name: logs
        emptyDir: {}

YAML

Image Security Scanning

# Falco rules for runtime security
apiVersion: v1
kind: ConfigMap
metadata:
  name: falco-rules
data:
  application_rules.yaml: |
    - rule: Detect shell in container
      desc: Notice shell activity within a container
      condition: >
        spawned_process and container and
        shell_procs and proc.tty != 0 and
        container_entrypoint
      output: >
        Shell spawned in container (user=%user.name %container.info
        shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)
      priority: WARNING

    - rule: File below a known binary directory opened for writing
      desc: >
        The package management process modifies binaries in these directories.
        This rule is meant to detect other processes modifying binary files.
      condition: >
        bin_dir and evt.is_open_write
        and not package_mgmt_procs
        and not exe_running_docker_save
        and not python_running_get_pip
        and not python_running_ms_oms
      output: >
        File below a known binary directory opened for writing (user=%user.name
        command=%proc.cmdline file=%fd.name %container.info)
      priority: WARNING

Observability Stack

# Complete observability stack with Prometheus Operator
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring

---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  ruleSelector:
    matchLabels:
      team: frontend
      prometheus: prometheus
  resources:
    requests:
      memory: 400Mi
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 50Gi
  alerting:
    alertmanagers:
    - namespace: monitoring
      name: alertmanager-main
      port: web

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: webapp-monitor
  namespace: monitoring
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: webapp
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: webapp-rules
  namespace: monitoring
  labels:
    team: frontend
    prometheus: prometheus
spec:
  groups:
  - name: webapp.rules
    rules:
    - alert: WebAppDown
      expr: up{job="webapp"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "WebApp instance is down"
        description: "WebApp instance {{ $labels.instance }} has been down for more than 5 minutes."

    - alert: WebAppHighErrorRate
      expr: rate(http_requests_total{job="webapp",status=~"5.."}[5m]) > 0.1
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "High error rate detected"
        description: "Error rate is {{ $value }} errors per second for {{ $labels.instance }}"

    - alert: WebAppHighLatency
      expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="webapp"}[5m])) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High latency detected"
        description: "95th percentile latency is {{ $value }}s for {{ $labels.instance }}"

# Complete observability stack with Prometheus Operator
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring

---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
  namespace: monitoring
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  ruleSelector:
    matchLabels:
      team: frontend
      prometheus: prometheus
  resources:
    requests:
      memory: 400Mi
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 50Gi
  alerting:
    alertmanagers:
    - namespace: monitoring
      name: alertmanager-main
      port: web

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: webapp-monitor
  namespace: monitoring
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: webapp
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: webapp-rules
  namespace: monitoring
  labels:
    team: frontend
    prometheus: prometheus
spec:
  groups:
  - name: webapp.rules
    rules:
    - alert: WebAppDown
      expr: up{job="webapp"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "WebApp instance is down"
        description: "WebApp instance {{ $labels.instance }} has been down for more than 5 minutes."

    - alert: WebAppHighErrorRate
      expr: rate(http_requests_total{job="webapp",status=~"5.."}[5m]) > 0.1
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "High error rate detected"
        description: "Error rate is {{ $value }} errors per second for {{ $labels.instance }}"

    - alert: WebAppHighLatency
      expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="webapp"}[5m])) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High latency detected"
        description: "95th percentile latency is {{ $value }}s for {{ $labels.instance }}"

YAML

Documentation and Runbooks

# ConfigMap containing runbooks
apiVersion: v1
kind: ConfigMap
metadata:
  name: operational-runbooks
data:
  incident-response.md: |
    # Incident Response Runbook

    ## Severity Levels

    ### P0 - Critical
    - Complete service outage
    - Data loss or corruption
    - Security breach

    **Response Time**: Immediate (< 15 minutes)

    ### P1 - High
    - Significant feature degradation
    - Performance issues affecting users

    **Response Time**: 1 hour

    ### P2 - Medium
    - Minor feature issues
    - Non-critical bugs

    **Response Time**: 24 hours

    ## Common Issues

    ### Pod CrashLoopBackOff
    ```bash
    # Check pod logs
    kubectl logs <pod-name> --previous

    # Check pod events
    kubectl describe pod <pod-name>

    # Check resource usage
    kubectl top pod <pod-name>
    ```

    ### Service Unavailable
    ```bash
    # Check service endpoints
    kubectl get endpoints <service-name>

    # Check pod readiness
    kubectl get pods -l app=<app-name>

    # Check ingress
    kubectl describe ingress <ingress-name>
    ```

    ### High Memory Usage
    ```bash
    # Check pod resource usage
    kubectl top pods --sort-by=memory

    # Check node resource usage
    kubectl top nodes

    # Restart high memory pods
    kubectl rollout restart deployment/<deployment-name>
    ```

  troubleshooting.md: |
    # Troubleshooting Guide

    ## Quick Diagnostic Commands

    ### Cluster Health
    ```bash
    # Check cluster components
    kubectl get componentstatuses

    # Check node status
    kubectl get nodes -o wide

    # Check system pods
    kubectl get pods -n kube-system
    ```

    ### Application Health
    ```bash
    # Check all resources in namespace
    kubectl get all -n <namespace>

    # Check recent events
    kubectl get events --sort-by=.metadata.creationTimestamp -n <namespace>

    # Check resource usage
    kubectl top pods -n <namespace> --sort-by=cpu
    ```

    ### Network Issues
    ```bash
    # Test DNS resolution
    kubectl run test-pod --image=busybox -it --rm -- nslookup kubernetes.default

    # Test service connectivity
    kubectl run test-pod --image=curlimages/curl -it --rm -- curl -v http://service-name:port/health

    # Check network policies
    kubectl get networkpolicies -A
    ```

    ### Storage Issues
    ```bash
    # Check PV status
    kubectl get pv

    # Check PVC status
    kubectl get pvc -A

    # Check storage classes
    kubectl get storageclass
    ```

# ConfigMap containing runbooks
apiVersion: v1
kind: ConfigMap
metadata:
  name: operational-runbooks
data:
  incident-response.md: |
    # Incident Response Runbook

    ## Severity Levels

    ### P0 - Critical
    - Complete service outage
    - Data loss or corruption
    - Security breach

    **Response Time**: Immediate (< 15 minutes)

    ### P1 - High
    - Significant feature degradation
    - Performance issues affecting users

    **Response Time**: 1 hour

    ### P2 - Medium
    - Minor feature issues
    - Non-critical bugs

    **Response Time**: 24 hours

    ## Common Issues

    ### Pod CrashLoopBackOff
    ```bash
    # Check pod logs
    kubectl logs <pod-name> --previous

    # Check pod events
    kubectl describe pod <pod-name>

    # Check resource usage
    kubectl top pod <pod-name>
    ```

    ### Service Unavailable
    ```bash
    # Check service endpoints
    kubectl get endpoints <service-name>

    # Check pod readiness
    kubectl get pods -l app=<app-name>

    # Check ingress
    kubectl describe ingress <ingress-name>
    ```

    ### High Memory Usage
    ```bash
    # Check pod resource usage
    kubectl top pods --sort-by=memory

    # Check node resource usage
    kubectl top nodes

    # Restart high memory pods
    kubectl rollout restart deployment/<deployment-name>
    ```

  troubleshooting.md: |
    # Troubleshooting Guide

    ## Quick Diagnostic Commands

    ### Cluster Health
    ```bash
    # Check cluster components
    kubectl get componentstatuses

    # Check node status
    kubectl get nodes -o wide

    # Check system pods
    kubectl get pods -n kube-system
    ```

    ### Application Health
    ```bash
    # Check all resources in namespace
    kubectl get all -n <namespace>

    # Check recent events
    kubectl get events --sort-by=.metadata.creationTimestamp -n <namespace>

    # Check resource usage
    kubectl top pods -n <namespace> --sort-by=cpu
    ```

    ### Network Issues
    ```bash
    # Test DNS resolution
    kubectl run test-pod --image=busybox -it --rm -- nslookup kubernetes.default

    # Test service connectivity
    kubectl run test-pod --image=curlimages/curl -it --rm -- curl -v http://service-name:port/health

    # Check network policies
    kubectl get networkpolicies -A
    ```

    ### Storage Issues
    ```bash
    # Check PV status
    kubectl get pv

    # Check PVC status
    kubectl get pvc -A

    # Check storage classes
    kubectl get storageclass
    ```

YAML

Advanced Namespace Configuration

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: prod
    team: backend
    cost-center: engineering
  annotations:
    description: "Production environment for backend services"
    contact: "backend-team@company.com"
    created-by: "platform-team"
spec:
  finalizers:
  - kubernetes

---
# Namespace with resource quotas and limits
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    persistentvolumeclaims: "50"
    pods: "100"
    services: "20"
    secrets: "50"
    configmaps: "50"
    count/deployments.apps: "30"
    count/statefulsets.apps: "10"
    count/jobs.batch: "20"

---
# Network policy for namespace isolation
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: production-isolation
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: production
    - namespaceSelector:
        matchLabels:
          name: monitoring
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: production
  - to: []  # Allow DNS
    ports:
    - protocol: UDP
      port: 53

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: prod
    team: backend
    cost-center: engineering
  annotations:
    description: "Production environment for backend services"
    contact: "backend-team@company.com"
    created-by: "platform-team"
spec:
  finalizers:
  - kubernetes

---
# Namespace with resource quotas and limits
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    persistentvolumeclaims: "50"
    pods: "100"
    services: "20"
    secrets: "50"
    configmaps: "50"
    count/deployments.apps: "30"
    count/statefulsets.apps: "10"
    count/jobs.batch: "20"

---
# Network policy for namespace isolation
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: production-isolation
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: production
    - namespaceSelector:
        matchLabels:
          name: monitoring
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: production
  - to: []  # Allow DNS
    ports:
    - protocol: UDP
      port: 53

YAML

Chapter 14: Performance and Optimization

Resource Management Strategies

Comprehensive Resource Planning

# Resource-optimized deployment with multiple strategies
apiVersion: apps/v1
kind: Deployment
metadata:
  name: optimized-app
  labels:
    app: optimized-app
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  selector:
    matchLabels:
      app: optimized-app
  template:
    metadata:
      labels:
        app: optimized-app
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      # Advanced scheduling
      priorityClassName: high-priority
      terminationGracePeriodSeconds: 30

      # Node selection and affinity
      nodeSelector:
        node-type: compute-optimized

      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: zone
                operator: In
                values: ["us-west-2a", "us-west-2b"]
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - optimized-app
              topologyKey: kubernetes.io/hostname

      # Security context
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
        seccompProfile:
          type: RuntimeDefault

      containers:
      - name: app
        image: myapp:v1.0.0
        imagePullPolicy: IfNotPresent

        # Resource management
        resources:
          requests:
            memory: "256Mi"
            cpu: "200m"
            ephemeral-storage: "1Gi"
          limits:
            memory: "512Mi"
            cpu: "500m"
            ephemeral-storage: "2Gi"

        # Security
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          capabilities:
            drop: ["ALL"]
            add: ["NET_BIND_SERVICE"]

        # Health checks
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
          successThreshold: 1

        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
          successThreshold: 1

        startupProbe:
          httpGet:
            path: /startup
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 30
          successThreshold: 1

        # Environment and volumes
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
        - name: config
          mountPath: /app/config
          readOnly: true

        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        - containerPort: 9090
          name: metrics
          protocol: TCP

      volumes:
      - name: tmp
        emptyDir:
          sizeLimit: 1Gi
      - name: cache
        emptyDir:
          sizeLimit: 2Gi
      - name: config
        configMap:
          name: app-config
          defaultMode: 0644

      # DNS configuration
      dnsPolicy: ClusterFirst
      dnsConfig:
        options:
        - name: ndots
          value: "2"
        - name: edns0

# Resource-optimized deployment with multiple strategies
apiVersion: apps/v1
kind: Deployment
metadata:
  name: optimized-app
  labels:
    app: optimized-app
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  selector:
    matchLabels:
      app: optimized-app
  template:
    metadata:
      labels:
        app: optimized-app
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      # Advanced scheduling
      priorityClassName: high-priority
      terminationGracePeriodSeconds: 30

      # Node selection and affinity
      nodeSelector:
        node-type: compute-optimized

      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: zone
                operator: In
                values: ["us-west-2a", "us-west-2b"]
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - optimized-app
              topologyKey: kubernetes.io/hostname

      # Security context
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
        seccompProfile:
          type: RuntimeDefault

      containers:
      - name: app
        image: myapp:v1.0.0
        imagePullPolicy: IfNotPresent

        # Resource management
        resources:
          requests:
            memory: "256Mi"
            cpu: "200m"
            ephemeral-storage: "1Gi"
          limits:
            memory: "512Mi"
            cpu: "500m"
            ephemeral-storage: "2Gi"

        # Security
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
          capabilities:
            drop: ["ALL"]
            add: ["NET_BIND_SERVICE"]

        # Health checks
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
          successThreshold: 1

        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
          successThreshold: 1

        startupProbe:
          httpGet:
            path: /startup
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 30
          successThreshold: 1

        # Environment and volumes
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
        - name: config
          mountPath: /app/config
          readOnly: true

        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        - containerPort: 9090
          name: metrics
          protocol: TCP

      volumes:
      - name: tmp
        emptyDir:
          sizeLimit: 1Gi
      - name: cache
        emptyDir:
          sizeLimit: 2Gi
      - name: config
        configMap:
          name: app-config
          defaultMode: 0644

      # DNS configuration
      dnsPolicy: ClusterFirst
      dnsConfig:
        options:
        - name: ndots
          value: "2"
        - name: edns0

YAML

Advanced Autoscaling Configurations

Multi-Metric HPA with Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: advanced-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: optimized-app
  minReplicas: 2
  maxReplicas: 50
  metrics:
  # CPU utilization
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

  # Memory utilization
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

  # Custom metric: requests per second
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

  # External metric: SQS queue length
  - type: External
    external:
      metric:
        name: sqs_queue_length
        selector:
          matchLabels:
            queue: "processing-queue"
      target:
        type: Value
        value: "10"

  # Scaling behavior
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Min
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 5
        periodSeconds: 30
      selectPolicy: Max

---
# Vertical Pod Autoscaler with advanced configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: advanced-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: optimized-app
  updatePolicy:
    updateMode: "Auto"
    minReplicas: 2
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: advanced-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: optimized-app
  minReplicas: 2
  maxReplicas: 50
  metrics:
  # CPU utilization
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

  # Memory utilization
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

  # Custom metric: requests per second
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

  # External metric: SQS queue length
  - type: External
    external:
      metric:
        name: sqs_queue_length
        selector:
          matchLabels:
            queue: "processing-queue"
      target:
        type: Value
        value: "10"

  # Scaling behavior
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Min
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 5
        periodSeconds: 30
      selectPolicy: Max

---
# Vertical Pod Autoscaler with advanced configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: advanced-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: optimized-app
  updatePolicy:
    updateMode: "Auto"
    minReplicas: 2
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

YAML

Performance Monitoring and Alerting

# Comprehensive monitoring stack
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-performance-monitor
  labels:
    app: optimized-app
spec:
  selector:
    matchLabels:
      app: optimized-app
  endpoints:
  - port: metrics
    interval: 15s
    path: /metrics
    scrapeTimeout: 10s

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: performance-alerts
spec:
  groups:
  - name: performance.rules
    rules:
    # High CPU usage
    - alert: HighCPUUsage
      expr: rate(container_cpu_usage_seconds_total{pod=~"optimized-app-.*"}[5m]) * 100 > 80
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage detected"
        description: "Pod {{ $labels.pod }} CPU usage is {{ $value }}%"

    # High memory usage
    - alert: HighMemoryUsage
      expr: container_memory_usage_bytes{pod=~"optimized-app-.*"} / container_spec_memory_limit_bytes * 100 > 85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage detected"
        description: "Pod {{ $labels.pod }} memory usage is {{ $value }}%"

    # High response time
    - alert: HighResponseTime
      expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="optimized-app"}[5m])) > 1
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "High response time detected"
        description: "95th percentile response time is {{ $value }}s"

    # Low throughput
    - alert: LowThroughput
      expr: rate(http_requests_total{job="optimized-app"}[5m]) < 10
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Low throughput detected"
        description: "Request rate is {{ $value }} requests/second"

# Comprehensive monitoring stack
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-performance-monitor
  labels:
    app: optimized-app
spec:
  selector:
    matchLabels:
      app: optimized-app
  endpoints:
  - port: metrics
    interval: 15s
    path: /metrics
    scrapeTimeout: 10s

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: performance-alerts
spec:
  groups:
  - name: performance.rules
    rules:
    # High CPU usage
    - alert: HighCPUUsage
      expr: rate(container_cpu_usage_seconds_total{pod=~"optimized-app-.*"}[5m]) * 100 > 80
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage detected"
        description: "Pod {{ $labels.pod }} CPU usage is {{ $value }}%"

    # High memory usage
    - alert: HighMemoryUsage
      expr: container_memory_usage_bytes{pod=~"optimized-app-.*"} / container_spec_memory_limit_bytes * 100 > 85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage detected"
        description: "Pod {{ $labels.pod }} memory usage is {{ $value }}%"

    # High response time
    - alert: HighResponseTime
      expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="optimized-app"}[5m])) > 1
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "High response time detected"
        description: "95th percentile response time is {{ $value }}s"

    # Low throughput
    - alert: LowThroughput
      expr: rate(http_requests_total{job="optimized-app"}[5m]) < 10
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Low throughput detected"
        description: "Request rate is {{ $value }} requests/second"

YAML

Chapter 15: Multi-Cloud and Hybrid

Multi-Cloud Architecture

graph TB
    subgraph "Management Layer"
        MC[Multi-Cloud Controller]
        ARGO[ArgoCD]
        TERRAFORM[Terraform]
    end

    subgraph "AWS"
        EKS[EKS Cluster]
        RDS[RDS Database]
        S3[S3 Storage]
    end

    subgraph "GCP"
        GKE[GKE Cluster]
        CLOUD_SQL[Cloud SQL]
        GCS[Cloud Storage]
    end

    subgraph "Azure"
        AKS[AKS Cluster]
        COSMOS[Cosmos DB]
        BLOB[Blob Storage]
    end

    subgraph "On-Premises"
        K8S[Kubernetes]
        DB[Database]
        NFS[NFS Storage]
    end

    MC --> EKS
    MC --> GKE
    MC --> AKS
    MC --> K8S

    ARGO --> EKS
    ARGO --> GKE
    ARGO --> AKS
    ARGO --> K8S

    style MC fill:#f9f,stroke:#333,stroke-width:2px

Cluster API Multi-Cloud Setup

# AWS Cluster
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: aws-production
  namespace: clusters
  labels:
    cloud: aws
    environment: production
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: aws-production
  controlPlaneRef:
    kind: KubeadmControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: aws-production-control-plane

---
# GCP Cluster
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: gcp-production
  namespace: clusters
  labels:
    cloud: gcp
    environment: production
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: GCPCluster
    name: gcp-production
  controlPlaneRef:
    kind: KubeadmControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: gcp-production-control-plane

---
# Multi-cloud application deployment
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: multi-cloud-app
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          environment: production
  template:
    metadata:
      name: '{{name}}-app'
    spec:
      project: default
      source:
        repoURL: https://github.com/company/k8s-manifests
        targetRevision: HEAD
        path: 'environments/{{metadata.labels.cloud}}'
      destination:
        server: '{{server}}'
        namespace: applications
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

# AWS Cluster
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: aws-production
  namespace: clusters
  labels:
    cloud: aws
    environment: production
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AWSCluster
    name: aws-production
  controlPlaneRef:
    kind: KubeadmControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: aws-production-control-plane

---
# GCP Cluster
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: gcp-production
  namespace: clusters
  labels:
    cloud: gcp
    environment: production
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
    services:
      cidrBlocks: ["10.96.0.0/12"]
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: GCPCluster
    name: gcp-production
  controlPlaneRef:
    kind: KubeadmControlPlane
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    name: gcp-production-control-plane

---
# Multi-cloud application deployment
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: multi-cloud-app
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          environment: production
  template:
    metadata:
      name: '{{name}}-app'
    spec:
      project: default
      source:
        repoURL: https://github.com/company/k8s-manifests
        targetRevision: HEAD
        path: 'environments/{{metadata.labels.cloud}}'
      destination:
        server: '{{server}}'
        namespace: applications
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

YAML

Cross-Cluster Service Mesh

# Istio multi-cluster setup
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: cross-cluster-gateway
spec:
  selector:
    istio: eastwestgateway
  servers:
  - port:
      number: 15443
      name: tls
      protocol: TLS
    tls:
      mode: ISTIO_MUTUAL
    hosts:
    - cross-network-primary.local

---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: cross-cluster-service
spec:
  host: remote-service.remote-cluster.local
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
  portLevelSettings:
  - port:
      number: 80
    loadBalancer:
      simple: LEAST_CONN

# Istio multi-cluster setup
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: cross-cluster-gateway
spec:
  selector:
    istio: eastwestgateway
  servers:
  - port:
      number: 15443
      name: tls
      protocol: TLS
    tls:
      mode: ISTIO_MUTUAL
    hosts:
    - cross-network-primary.local

---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: cross-cluster-service
spec:
  host: remote-service.remote-cluster.local
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
  portLevelSettings:
  - port:
      number: 80
    loadBalancer:
      simple: LEAST_CONN

YAML

Chapter 16: DevOps Integration

Advanced CI/CD Pipeline

# GitLab CI/CD with Kubernetes integration
stages:
  - test
  - build
  - security-scan
  - deploy-staging
  - integration-test
  - deploy-production
  - post-deploy

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"
  KUBERNETES_NAMESPACE: "production"
  HELM_CHART_PATH: "./helm/myapp"

# Test stage
test:
  stage: test
  image: node:16-alpine
  script:
    - npm ci
    - npm run test:unit
    - npm run test:integration
  coverage: '/Coverage: \d+\.\d+%/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml
    paths:
      - coverage/
    expire_in: 1 week

# Build and push image
build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  before_script:
    - echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
  script:
    - docker build --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') 
                   --build-arg VCS_REF=$CI_COMMIT_SHA 
                   --build-arg VERSION=$CI_COMMIT_TAG 
                   -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA 
                   -t $CI_REGISTRY_IMAGE:latest .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - docker push $CI_REGISTRY_IMAGE:latest

# Security scanning
security-scan:
  stage: security-scan
  image: aquasecurity/trivy:latest
  script:
    - trivy image --exit-code 0 --no-progress --format template --template "@contrib/sarif.tpl" -o trivy-results.sarif $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - trivy image --exit-code 1 --severity HIGH,CRITICAL --no-progress $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  artifacts:
    reports:
      sast: trivy-results.sarif

# Deploy to staging
deploy-staging:
  stage: deploy-staging
  image: alpine/helm:latest
  before_script:
    - kubectl config use-context staging
  script:
    - helm upgrade --install myapp-staging $HELM_CHART_PATH 
        --namespace staging 
        --set image.repository=$CI_REGISTRY_IMAGE 
        --set image.tag=$CI_COMMIT_SHA 
        --set environment=staging 
        --wait --timeout=300s
  environment:
    name: staging
    url: https://staging.myapp.com
  only:
    - develop

# Integration tests
integration-test:
  stage: integration-test
  image: postman/newman:alpine
  script:
    - newman run tests/integration/api-tests.json 
        --environment tests/integration/staging-env.json
        --reporters cli,junit --reporter-junit-export integration-results.xml
  artifacts:
    reports:
      junit: integration-results.xml
  dependencies:
    - deploy-staging

# Production deployment
deploy-production:
  stage: deploy-production
  image: alpine/helm:latest
  before_script:
    - kubectl config use-context production
  script:
    - helm upgrade --install myapp $HELM_CHART_PATH 
        --namespace production 
        --set image.repository=$CI_REGISTRY_IMAGE 
        --set image.tag=$CI_COMMIT_SHA 
        --set environment=production 
        --set replicaCount=5 
        --wait --timeout=600s
  environment:
    name: production
    url: https://myapp.com
  when: manual
  only:
    - main

# Post-deployment verification
post-deploy:
  stage: post-deploy
  image: curlimages/curl:latest
  script:
    - sleep 30  # Wait for deployment to stabilize
    - curl -f https://myapp.com/health || exit 1
    - curl -f https://myapp.com/metrics || exit 1
  dependencies:
    - deploy-production

# GitLab CI/CD with Kubernetes integration
stages:
  - test
  - build
  - security-scan
  - deploy-staging
  - integration-test
  - deploy-production
  - post-deploy

variables:
  DOCKER_DRIVER: overlay2
  DOCKER_TLS_CERTDIR: "/certs"
  KUBERNETES_NAMESPACE: "production"
  HELM_CHART_PATH: "./helm/myapp"

# Test stage
test:
  stage: test
  image: node:16-alpine
  script:
    - npm ci
    - npm run test:unit
    - npm run test:integration
  coverage: '/Coverage: \d+\.\d+%/'
  artifacts:
    reports:
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml
    paths:
      - coverage/
    expire_in: 1 week

# Build and push image
build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  before_script:
    - echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
  script:
    - docker build --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') 
                   --build-arg VCS_REF=$CI_COMMIT_SHA 
                   --build-arg VERSION=$CI_COMMIT_TAG 
                   -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA 
                   -t $CI_REGISTRY_IMAGE:latest .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - docker push $CI_REGISTRY_IMAGE:latest

# Security scanning
security-scan:
  stage: security-scan
  image: aquasecurity/trivy:latest
  script:
    - trivy image --exit-code 0 --no-progress --format template --template "@contrib/sarif.tpl" -o trivy-results.sarif $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - trivy image --exit-code 1 --severity HIGH,CRITICAL --no-progress $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  artifacts:
    reports:
      sast: trivy-results.sarif

# Deploy to staging
deploy-staging:
  stage: deploy-staging
  image: alpine/helm:latest
  before_script:
    - kubectl config use-context staging
  script:
    - helm upgrade --install myapp-staging $HELM_CHART_PATH 
        --namespace staging 
        --set image.repository=$CI_REGISTRY_IMAGE 
        --set image.tag=$CI_COMMIT_SHA 
        --set environment=staging 
        --wait --timeout=300s
  environment:
    name: staging
    url: https://staging.myapp.com
  only:
    - develop

# Integration tests
integration-test:
  stage: integration-test
  image: postman/newman:alpine
  script:
    - newman run tests/integration/api-tests.json 
        --environment tests/integration/staging-env.json
        --reporters cli,junit --reporter-junit-export integration-results.xml
  artifacts:
    reports:
      junit: integration-results.xml
  dependencies:
    - deploy-staging

# Production deployment
deploy-production:
  stage: deploy-production
  image: alpine/helm:latest
  before_script:
    - kubectl config use-context production
  script:
    - helm upgrade --install myapp $HELM_CHART_PATH 
        --namespace production 
        --set image.repository=$CI_REGISTRY_IMAGE 
        --set image.tag=$CI_COMMIT_SHA 
        --set environment=production 
        --set replicaCount=5 
        --wait --timeout=600s
  environment:
    name: production
    url: https://myapp.com
  when: manual
  only:
    - main

# Post-deployment verification
post-deploy:
  stage: post-deploy
  image: curlimages/curl:latest
  script:
    - sleep 30  # Wait for deployment to stabilize
    - curl -f https://myapp.com/health || exit 1
    - curl -f https://myapp.com/metrics || exit 1
  dependencies:
    - deploy-production

YAML

Advanced Helm Chart Structure

# Chart.yaml
apiVersion: v2
name: myapp
description: A production-ready application Helm chart
type: application
version: 1.0.0
appVersion: "1.0.0"
keywords:
  - web
  - api
  - microservice
home: https://github.com/company/myapp
sources:
  - https://github.com/company/myapp
maintainers:
  - name: Platform Team
    email: platform@company.com
dependencies:
  - name: postgresql
    version: 11.9.13
    repository: https://charts.bitnami.com/bitnami
    condition: postgresql.enabled
  - name: redis
    version: 17.3.7
    repository: https://charts.bitnami.com/bitnami
    condition: redis.enabled

---
# values.yaml with comprehensive configuration
replicaCount: 3

image:
  repository: mycompany/myapp
  pullPolicy: IfNotPresent
  tag: "latest"

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 2000

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000
  capabilities:
    drop:
    - ALL

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
  hosts:
    - host: myapp.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: myapp-tls
      hosts:
        - myapp.example.com

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 200m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - myapp
        topologyKey: kubernetes.io/hostname

# Database configuration
postgresql:
  enabled: true
  auth:
    existingSecret: "myapp-db-secret"
  primary:
    persistence:
      enabled: true
      size: 20Gi
      storageClass: "fast-ssd"

redis:
  enabled: true
  auth:
    enabled: true
    existingSecret: "myapp-redis-secret"
  master:
    persistence:
      enabled: true
      size: 8Gi

# Application-specific configuration
config:
  environment: production
  logLevel: info
  features:
    newUI: true
    advancedSearch: true
    analytics: true

# Monitoring
monitoring:
  enabled: true
  serviceMonitor:
    enabled: true
    interval: 30s

# Backup configuration
backup:
  enabled: true
  schedule: "0 2 * * *"
  retention: "30d"

# Chart.yaml
apiVersion: v2
name: myapp
description: A production-ready application Helm chart
type: application
version: 1.0.0
appVersion: "1.0.0"
keywords:
  - web
  - api
  - microservice
home: https://github.com/company/myapp
sources:
  - https://github.com/company/myapp
maintainers:
  - name: Platform Team
    email: platform@company.com
dependencies:
  - name: postgresql
    version: 11.9.13
    repository: https://charts.bitnami.com/bitnami
    condition: postgresql.enabled
  - name: redis
    version: 17.3.7
    repository: https://charts.bitnami.com/bitnami
    condition: redis.enabled

---
# values.yaml with comprehensive configuration
replicaCount: 3

image:
  repository: mycompany/myapp
  pullPolicy: IfNotPresent
  tag: "latest"

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 2000

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000
  capabilities:
    drop:
    - ALL

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
  hosts:
    - host: myapp.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: myapp-tls
      hosts:
        - myapp.example.com

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 200m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - myapp
        topologyKey: kubernetes.io/hostname

# Database configuration
postgresql:
  enabled: true
  auth:
    existingSecret: "myapp-db-secret"
  primary:
    persistence:
      enabled: true
      size: 20Gi
      storageClass: "fast-ssd"

redis:
  enabled: true
  auth:
    enabled: true
    existingSecret: "myapp-redis-secret"
  master:
    persistence:
      enabled: true
      size: 8Gi

# Application-specific configuration
config:
  environment: production
  logLevel: info
  features:
    newUI: true
    advancedSearch: true
    analytics: true

# Monitoring
monitoring:
  enabled: true
  serviceMonitor:
    enabled: true
    interval: 30s

# Backup configuration
backup:
  enabled: true
  schedule: "0 2 * * *"
  retention: "30d"

YAML

Progressive Delivery with Flagger

# Canary deployment configuration
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
  namespace: production
spec:
  # Deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp

  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: myapp

  # Service configuration
  service:
    port: 80
    targetPort: 8080
    gateways:
    - myapp-gateway
    hosts:
    - myapp.example.com
    trafficPolicy:
      tls:
        mode: DISABLE

  # Canary analysis
  analysis:
    # Schedule interval
    interval: 1m
    # Max number of failed metric checks before rollback
    threshold: 5
    # Max traffic percentage routed to canary
    maxWeight: 50
    # Canary increment step
    stepWeight: 5
    # Prometheus checks
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 30s
    # Load testing
    webhooks:
    - name: load-test
      url: http://flagger-loadtester.test/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary.production:80/"

  # Alert manager configuration
  alerting:
    providers:
    - name: "on-call"
      type: slack
      channel: alerts
      username: flagger

# Canary deployment configuration
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
  namespace: production
spec:
  # Deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp

  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: myapp

  # Service configuration
  service:
    port: 80
    targetPort: 8080
    gateways:
    - myapp-gateway
    hosts:
    - myapp.example.com
    trafficPolicy:
      tls:
        mode: DISABLE

  # Canary analysis
  analysis:
    # Schedule interval
    interval: 1m
    # Max number of failed metric checks before rollback
    threshold: 5
    # Max traffic percentage routed to canary
    maxWeight: 50
    # Canary increment step
    stepWeight: 5
    # Prometheus checks
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 30s
    # Load testing
    webhooks:
    - name: load-test
      url: http://flagger-loadtester.test/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary.production:80/"

  # Alert manager configuration
  alerting:
    providers:
    - name: "on-call"
      type: slack
      channel: alerts
      username: flagger

YAML

Complete Monitoring Stack

# Comprehensive monitoring with Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: monitoring

resources:
- prometheus-operator.yaml
- prometheus.yaml
- alertmanager.yaml
- grafana.yaml
- servicemonitors.yaml
- rules.yaml

configMapGenerator:
- name: grafana-dashboards
  files:
  - dashboards/kubernetes-cluster.json
  - dashboards/kubernetes-pods.json
  - dashboards/application-metrics.json

secretGenerator:
- name: alertmanager-config
  files:
  - alertmanager.yml

patchesStrategicMerge:
- prometheus-patch.yaml
- grafana-patch.yaml

images:
- name: prom/prometheus
  newTag: v2.40.0
- name: grafana/grafana
  newTag: 9.2.0
- name: prom/alertmanager
  newTag: v0.25.0

replicas:
- name: prometheus
  count: 2
- name: grafana
  count: 2

---
# Advanced Prometheus configuration
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  replicas: 2
  retention: 30d
  retentionSize: 50GB

  # Storage configuration
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi

  # Resource management
  resources:
    requests:
      memory: 2Gi
      cpu: 1000m
    limits:
      memory: 4Gi
      cpu: 2000m

  # Security
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000

  # Service discovery
  serviceMonitorSelector:
    matchLabels:
      monitoring: enabled

  podMonitorSelector:
    matchLabels:
      monitoring: enabled

  ruleSelector:
    matchLabels:
      monitoring: enabled

  # Additional scrape configs
  additionalScrapeConfigs:
    name: additional-scrape-configs
    key: prometheus-additional.yaml

  # Alerting
  alerting:
    alertmanagers:
    - namespace: monitoring
      name: alertmanager-operated
      port: web

  # External labels
  externalLabels:
    cluster: production
    region: us-west-2

  # Remote write configuration for long-term storage
  remoteWrite:
  - url: "https://prometheus-us-central1.grafana.net/api/prom/push"
    writeRelabelConfigs:
    - sourceLabels: [__name__]
      regex: 'kubernetes_.*'
      action: drop
    basicAuth:
      username:
        name: grafana-cloud-credentials
        key: username
      password:
        name: grafana-cloud-credentials
        key: password

# Comprehensive monitoring with Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: monitoring

resources:
- prometheus-operator.yaml
- prometheus.yaml
- alertmanager.yaml
- grafana.yaml
- servicemonitors.yaml
- rules.yaml

configMapGenerator:
- name: grafana-dashboards
  files:
  - dashboards/kubernetes-cluster.json
  - dashboards/kubernetes-pods.json
  - dashboards/application-metrics.json

secretGenerator:
- name: alertmanager-config
  files:
  - alertmanager.yml

patchesStrategicMerge:
- prometheus-patch.yaml
- grafana-patch.yaml

images:
- name: prom/prometheus
  newTag: v2.40.0
- name: grafana/grafana
  newTag: 9.2.0
- name: prom/alertmanager
  newTag: v0.25.0

replicas:
- name: prometheus
  count: 2
- name: grafana
  count: 2

---
# Advanced Prometheus configuration
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  replicas: 2
  retention: 30d
  retentionSize: 50GB

  # Storage configuration
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi

  # Resource management
  resources:
    requests:
      memory: 2Gi
      cpu: 1000m
    limits:
      memory: 4Gi
      cpu: 2000m

  # Security
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000

  # Service discovery
  serviceMonitorSelector:
    matchLabels:
      monitoring: enabled

  podMonitorSelector:
    matchLabels:
      monitoring: enabled

  ruleSelector:
    matchLabels:
      monitoring: enabled

  # Additional scrape configs
  additionalScrapeConfigs:
    name: additional-scrape-configs
    key: prometheus-additional.yaml

  # Alerting
  alerting:
    alertmanagers:
    - namespace: monitoring
      name: alertmanager-operated
      port: web

  # External labels
  externalLabels:
    cluster: production
    region: us-west-2

  # Remote write configuration for long-term storage
  remoteWrite:
  - url: "https://prometheus-us-central1.grafana.net/api/prom/push"
    writeRelabelConfigs:
    - sourceLabels: [__name__]
      regex: 'kubernetes_.*'
      action: drop
    basicAuth:
      username:
        name: grafana-cloud-credentials
        key: username
      password:
        name: grafana-cloud-credentials
        key: password

YAML

Disaster Recovery Automation

# Automated disaster recovery with Velero
apiVersion: v1
kind: ConfigMap
metadata:
  name: velero-disaster-recovery
data:
  recovery-script.sh: |
    #!/bin/bash
    set -e

    echo "Starting disaster recovery process..."

    # Validate backup exists
    BACKUP_NAME=${1:-"latest"}
    if ! velero backup get $BACKUP_NAME; then
        echo "Backup $BACKUP_NAME not found!"
        exit 1
    fi

    # Create restore
    RESTORE_NAME="dr-restore-$(date +%Y%m%d-%H%M%S)"
    velero restore create $RESTORE_NAME \
        --from-backup $BACKUP_NAME \
        --wait

    # Verify restore
    echo "Verifying restore..."
    kubectl get pods --all-namespaces

    # Run health checks
    echo "Running health checks..."
    for ns in production staging; do
        kubectl wait --for=condition=ready pod \
            --all -n $ns --timeout=300s
    done

    echo "Disaster recovery completed successfully!"

---
# CronJob for regular DR testing
apiVersion: batch/v1
kind: CronJob
metadata:
  name: dr-test
spec:
  schedule: "0 2 * * 0"  # Weekly on Sunday at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: dr-test
            image: velero/velero:latest
            command: ["/bin/bash"]
            args:
            - -c
            - |
              # Test backup integrity
              velero backup describe daily-backup-$(date -d "yesterday" +%Y%m%d) \
                --details || exit 1

              # Test restore to test namespace
              velero restore create test-restore-$(date +%Y%m%d) \
                --from-backup daily-backup-$(date -d "yesterday" +%Y%m%d) \
                --namespace-mappings production:dr-test \
                --wait

              # Verify test restore
              kubectl wait --for=condition=ready pod \
                --all -n dr-test --timeout=300s

              # Cleanup test namespace
              kubectl delete namespace dr-test --ignore-not-found

              echo "DR test completed successfully"
          restartPolicy: OnFailure

# Automated disaster recovery with Velero
apiVersion: v1
kind: ConfigMap
metadata:
  name: velero-disaster-recovery
data:
  recovery-script.sh: |
    #!/bin/bash
    set -e

    echo "Starting disaster recovery process..."

    # Validate backup exists
    BACKUP_NAME=${1:-"latest"}
    if ! velero backup get $BACKUP_NAME; then
        echo "Backup $BACKUP_NAME not found!"
        exit 1
    fi

    # Create restore
    RESTORE_NAME="dr-restore-$(date +%Y%m%d-%H%M%S)"
    velero restore create $RESTORE_NAME \
        --from-backup $BACKUP_NAME \
        --wait

    # Verify restore
    echo "Verifying restore..."
    kubectl get pods --all-namespaces

    # Run health checks
    echo "Running health checks..."
    for ns in production staging; do
        kubectl wait --for=condition=ready pod \
            --all -n $ns --timeout=300s
    done

    echo "Disaster recovery completed successfully!"

---
# CronJob for regular DR testing
apiVersion: batch/v1
kind: CronJob
metadata:
  name: dr-test
spec:
  schedule: "0 2 * * 0"  # Weekly on Sunday at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: dr-test
            image: velero/velero:latest
            command: ["/bin/bash"]
            args:
            - -c
            - |
              # Test backup integrity
              velero backup describe daily-backup-$(date -d "yesterday" +%Y%m%d) \
                --details || exit 1

              # Test restore to test namespace
              velero restore create test-restore-$(date +%Y%m%d) \
                --from-backup daily-backup-$(date -d "yesterday" +%Y%m%d) \
                --namespace-mappings production:dr-test \
                --wait

              # Verify test restore
              kubectl wait --for=condition=ready pod \
                --all -n dr-test --timeout=300s

              # Cleanup test namespace
              kubectl delete namespace dr-test --ignore-not-found

              echo "DR test completed successfully"
          restartPolicy: OnFailure

YAML

Enhanced Quick Reference

kubectl Power Commands

# Advanced resource queries
kubectl get pods -o custom-columns="NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName,IP:.status.podIP"
kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}'

# Resource usage monitoring
kubectl top pods --all-namespaces --sort-by=memory
kubectl top nodes --sort-by=cpu

# Debugging and troubleshooting
kubectl debug pod/my-pod -it --image=nicolaka/netshoot
kubectl logs -f deployment/my-app --all-containers=true
kubectl describe pod my-pod | grep -A 10 Events

# Bulk operations
kubectl delete pods --field-selector=status.phase==Failed
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\n"}{end}' | grep my-app

# Security and RBAC
kubectl auth can-i create pods --as=system:serviceaccount:default:my-sa
kubectl get rolebindings,clusterrolebindings --all-namespaces -o wide

# Resource management
kubectl patch deployment my-app -p '{"spec":{"template":{"spec":{"containers":[{"name":"app","resources":{"requests":{"memory":"256Mi"}}}]}}}}'
kubectl scale deployment my-app --replicas=5 --timeout=300s

# Advanced resource queries
kubectl get pods -o custom-columns="NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName,IP:.status.podIP"
kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}'

# Resource usage monitoring
kubectl top pods --all-namespaces --sort-by=memory
kubectl top nodes --sort-by=cpu

# Debugging and troubleshooting
kubectl debug pod/my-pod -it --image=nicolaka/netshoot
kubectl logs -f deployment/my-app --all-containers=true
kubectl describe pod my-pod | grep -A 10 Events

# Bulk operations
kubectl delete pods --field-selector=status.phase==Failed
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\n"}{end}' | grep my-app

# Security and RBAC
kubectl auth can-i create pods --as=system:serviceaccount:default:my-sa
kubectl get rolebindings,clusterrolebindings --all-namespaces -o wide

# Resource management
kubectl patch deployment my-app -p '{"spec":{"template":{"spec":{"containers":[{"name":"app","resources":{"requests":{"memory":"256Mi"}}}]}}}}'
kubectl scale deployment my-app --replicas=5 --timeout=300s

Bash

Helm Advanced Commands

# Chart development and testing
helm create my-chart
helm template my-app ./my-chart --debug
helm lint ./my-chart
helm test my-app

# Release management
helm upgrade my-app ./my-chart --reuse-values --wait --timeout=300s
helm rollback my-app 1 --wait
helm history my-app

# Repository management
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm search repo bitnami/postgresql --versions

# Values and configuration
helm get values my-app
helm show values bitnami/postgresql
helm upgrade my-app ./my-chart --set image.tag=v2.0.0 --set replicaCount=3

# Chart development and testing
helm create my-chart
helm template my-app ./my-chart --debug
helm lint ./my-chart
helm test my-app

# Release management
helm upgrade my-app ./my-chart --reuse-values --wait --timeout=300s
helm rollback my-app 1 --wait
helm history my-app

# Repository management
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm search repo bitnami/postgresql --versions

# Values and configuration
helm get values my-app
helm show values bitnami/postgresql
helm upgrade my-app ./my-chart --set image.tag=v2.0.0 --set replicaCount=3

Bash

Conclusion and Future Trends

This comprehensive Kubernetes guide represents current best practices and production-ready patterns. As the ecosystem evolves, keep an eye on these emerging trends:

Emerging Technologies

WebAssembly (WASM): Running WASM workloads in Kubernetes
Edge Computing: K3s, MicroK8s for edge deployments
Serverless: Knative, OpenFaaS for serverless workloads
AI/ML Operations: Kubeflow, MLflow integration
eBPF: Advanced networking and security with Cilium
GitOps Evolution: FluxCD v2, ArgoCD ApplicationSets

Best Practices Summary

Security-First Approach: Always implement security from the ground up
Observability: Comprehensive monitoring, logging, and tracing
Automation: GitOps, CI/CD, and Infrastructure as Code
Resource Optimization: Right-sizing, autoscaling, and cost management
Disaster Recovery: Regular backups, testing, and documented procedures
Documentation: Maintain runbooks, troubleshooting guides, and architectural decisions

Continuous Learning Resources

Certification Paths: CKA, CKAD, CKS
Community: CNCF, Kubernetes Slack, local meetups
Training Platforms: A Cloud Guru, Pluralsight, Linux Academy
Hands-on Practice: Katacoda, Play with Kubernetes
Conference Content: KubeCon, DockerCon, CloudNativeCon

Remember: Kubernetes mastery is a journey, not a destination. Stay curious, keep experimenting, and always prioritize reliability and security in your deployments.

This production-grade guide serves as your comprehensive reference for Kubernetes deployment and operations. Continue evolving your practices with the rapidly advancing cloud-native ecosystem.

Discover more from Altgr Blog

Subscribe to get the latest posts sent to your email.

Table of Contents

Chapter 1: Getting Started

Introduction to Kubernetes

Core Benefits

Architecture Overview 1

Environment Setup Matrix

Container Evolution

Why Kubernetes?

Architecture Overview 2

Control Plane Components

Worker Node Components

Container Fundamentals

Containers vs VMs

Namespaces

Chapter 2: Core Concepts

Pods

Multi-container Pods

Labels and Selectors

Chapter 3: Cluster Management

Setting up Clusters

Minikube (Local Development)

kubeadm (Production)

Managed Kubernetes Services

AWS EKS

Google GKE

Azure AKS

kubectl Commands

Cluster Information

Resource Management

Resource Creation and Updates

Chapter 4: Workloads

Deployments

Deployment Strategies

ReplicaSets

StatefulSets

StatefulSet vs Deployment

DaemonSets

Jobs and CronJobs

Jobs

CronJobs

Chapter 5: Services and Networking

Service Types

ClusterIP Service

NodePort Service

LoadBalancer Service

ExternalName Service

Ingress

Ingress Resource

NGINX Ingress Controller

Network Policies

Chapter 6: Storage

Persistent Volumes and Claims

PersistentVolume

PersistentVolumeClaim

StorageClass

Volume Types

EmptyDir

HostPath

ConfigMap Volume

Secret Volume

Chapter 7: Configuration Management

ConfigMaps

Secrets

Managing Environment Variables

Chapter 8: Security and RBAC

Role-Based Access Control (RBAC)

ServiceAccount

ClusterRole and ClusterRoleBinding

Pod Security Standards

Network Policies

Resource Quotas and Limits

Chapter 9: Monitoring and Logging

Prometheus and Grafana Stack

Prometheus Configuration

Grafana Dashboard

Application Metrics

EFK Stack (Elasticsearch, Fluentd, Kibana)

Fluentd Configuration

Chapter 10: Troubleshooting

Common Issues and Solutions