1. Introduction
This guide provides a comprehensive approach to deploying and managing production-grade Kubernetes clusters using Ansible automation. We’ll cover everything from basic setup to advanced configurations including high availability, monitoring, logging, GitOps, and security hardening.
2. System Architecture
graph TD
subgraph "Control Plane"
A[API Server] --> B[etcd]
A --> C[Controller Manager]
A --> D[Scheduler]
end
subgraph "Worker Nodes"
E[kubelet] --> F[Container Runtime]
G[kube-proxy] --> H[Pod Network]
end
subgraph "Infrastructure Services"
I[Ingress Controller]
J[Monitoring Stack]
K[Logging Stack]
L[GitOps Controller]
M[Backup System]
N[Secrets Management]
end
subgraph "Ansible Control"
O[Ansible Controller] --> P[Inventory]
O --> Q[Playbooks]
O --> R[Roles]
O --> S[Variables]
end
O -->|"Control Plane"| A
O -->|"Worker Nodes"| E
O -->|"Infrastructure Services"| I3. Prerequisites
Hardware Requirements
- Control Plane Nodes: 2+ CPUs, 4GB+ RAM, 50GB+ storage
- Worker Nodes: 4+ CPUs, 8GB+ RAM, 100GB+ storage
- Ansible Controller: 2+ CPUs, 4GB+ RAM
OS Requirements
- Ubuntu 22.04 LTS or CentOS/RHEL 8+ (all nodes)
- Python 3.8+ (Ansible controller)
Network Requirements
- All nodes must have unique hostnames, MAC addresses, and product_uuids
- Disable swap on all Kubernetes nodes
- Ensure connectivity between all nodes (TCP ports 6443, 2379-2380, 10250-10252)
- Unique subnet for pod networking (e.g., 10.244.0.0/16)
- Load balancer for API server (for HA setup)
Package Requirements
# On Ansible controller
sudo apt update
sudo apt install -y python3-pip
pip3 install ansible ansible-core netaddr jmespathBash4. Ansible Project Structure
graph TD
A[ansible-kubernetes] --> B[inventories]
A --> C[roles]
A --> D[playbooks]
A --> E[group_vars]
A --> F[host_vars]
A --> G[files]
B --> B1[production]
B --> B2[staging]
C --> C1[common]
C --> C2[kubernetes]
C --> C3[containerd]
C --> C4[networking]
C --> C5[monitoring]
C --> C6[logging]
C --> C7[ingress]
C --> C8[gitops]
C --> C9[backup]
C --> C10[secrets]
D --> D1[site.yml]
D --> D2[kubernetes.yml]
D --> D3[addons.yml]Create Base Structure
mkdir -p ansible-kubernetes/{inventories/{production,staging},roles,playbooks,group_vars,host_vars,files}
cd ansible-kubernetesBash5. Setting Up Inventory and Variables
Inventory Structure
---
all:
children:
kubernetes:
children:
control_plane:
hosts:
master01:
ansible_host: 192.168.1.101
master02:
ansible_host: 192.168.1.102
master03:
ansible_host: 192.168.1.103
workers:
hosts:
worker01:
ansible_host: 192.168.1.111
worker02:
ansible_host: 192.168.1.112
worker03:
ansible_host: 192.168.1.113
lb:
hosts:
lb01:
ansible_host: 192.168.1.100
virtual_ip: 192.168.1.200
etcd:
children:
control_plane:YAMLGroup Variables
---
# Kubernetes configuration
kubernetes_version: "1.26.0"
pod_network_cidr: "10.244.0.0/16"
service_network_cidr: "10.96.0.0/12"
kubernetes_api_server_port: 6443
kubernetes_dns_domain: "cluster.local"
# Container runtime configuration
container_runtime: "containerd"
containerd_version: "1.6.8"
# CNI configuration
cni_plugin: "calico"
calico_version: "v3.24.5"
# Control plane configuration
control_plane_endpoint: "{{ hostvars.lb01.virtual_ip }}:{{ kubernetes_api_server_port }}"
control_plane_endpoint_noport: "{{ hostvars.lb01.virtual_ip }}"
# Add-ons configuration
enable_dashboard: true
enable_metrics_server: true
enable_ingress: true
ingress_controller: "nginx"
nginx_ingress_version: "v1.6.4"
# Monitoring configuration
enable_monitoring: true
prometheus_operator_version: "v0.63.0"
grafana_admin_password: "change-me-in-production"
# Logging configuration
enable_logging: true
logging_stack: "efk" # Options: efk, loki
# GitOps configuration
enable_gitops: true
gitops_tool: "argocd" # Options: argocd, fluxcd
argocd_version: "v2.6.3"
# Backup configuration
enable_backup: true
backup_tool: "velero"
velero_version: "v1.10.1"
backup_bucket: "k8s-backups"
backup_provider: "aws" # Options: aws, gcp, azure, minio
# Secrets management
secrets_management: "sealed-secrets" # Options: sealed-secrets, vault
sealed_secrets_version: "v0.20.5"
# Security configuration
apiserver_cert_extra_sans:
- "kubernetes"
- "kubernetes.default"
- "{{ control_plane_endpoint_noport }}"
# HA configuration
ha_enabled: true
keepalived_virtual_ip: "{{ hostvars.lb01.virtual_ip }}"
keepalived_interface: "eth0"
haproxy_connect_timeout: "10s"
haproxy_client_timeout: "30s"
haproxy_server_timeout: "30s"YAML6. Core Kubernetes Installation
Common Role Tasks
---
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
when: ansible_os_family == "Debian"
- name: Install required packages
package:
name:
- apt-transport-https
- ca-certificates
- curl
- gnupg
- lsb-release
- python3-pip
- python3-setuptools
- ntp
- iptables
- software-properties-common
state: present
- name: Configure hostnames
hostname:
name: "{{ inventory_hostname }}"
- name: Update /etc/hosts
lineinfile:
path: /etc/hosts
line: "{{ hostvars[item].ansible_host }} {{ item }}"
state: present
loop: "{{ groups['kubernetes'] }}"
- name: Disable swap
shell: |
swapoff -a
sed -i '/swap/d' /etc/fstab
args:
executable: /bin/bash
- name: Configure kernel modules for Kubernetes
copy:
dest: "/etc/modules-load.d/k8s.conf"
content: |
overlay
br_netfilter
- name: Load kernel modules
command: "modprobe {{ item }}"
loop:
- overlay
- br_netfilter
changed_when: false
- name: Configure sysctl parameters for Kubernetes
copy:
dest: "/etc/sysctl.d/k8s.conf"
content: |
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
- name: Apply sysctl parameters
command: sysctl --system
changed_when: falseYAMLContainer Runtime Installation (Containerd)
---
- name: Install containerd dependencies
package:
name:
- containerd.io
state: present
register: pkg_result
until: pkg_result is success
retries: 3
delay: 5
- name: Create containerd configuration directory
file:
path: /etc/containerd
state: directory
mode: '0755'
- name: Generate default containerd configuration
shell: containerd config default > /etc/containerd/config.toml
args:
creates: /etc/containerd/config.toml
- name: Configure containerd to use systemd cgroup driver
replace:
path: /etc/containerd/config.toml
regexp: 'SystemdCgroup = false'
replace: 'SystemdCgroup = true'
notify: restart containerd
- name: Enable and start containerd service
systemd:
name: containerd
state: started
enabled: yes
daemon_reload: yesYAMLKubernetes Components Installation
---
- name: Add Kubernetes apt key
apt_key:
url: https://packages.cloud.google.com/apt/doc/apt-key.gpg
state: present
when: ansible_os_family == "Debian"
- name: Add Kubernetes repository
apt_repository:
repo: deb https://apt.kubernetes.io/ kubernetes-xenial main
state: present
filename: kubernetes
when: ansible_os_family == "Debian"
- name: Install Kubernetes components
package:
name:
- kubelet={{ kubernetes_version }}-00
- kubeadm={{ kubernetes_version }}-00
- kubectl={{ kubernetes_version }}-00
state: present
register: pkg_result
until: pkg_result is success
retries: 3
delay: 5
- name: Hold Kubernetes components
dpkg_selections:
name: "{{ item }}"
selection: hold
loop:
- kubelet
- kubeadm
- kubectl
when: ansible_os_family == "Debian"
- name: Enable and start kubelet service
systemd:
name: kubelet
state: started
enabled: yes
daemon_reload: yesYAMLKubernetes Cluster Initialization
---
- name: Initialize Kubernetes cluster with kubeadm
command: >
kubeadm init
--control-plane-endpoint "{{ control_plane_endpoint }}"
--upload-certs
--pod-network-cidr={{ pod_network_cidr }}
--service-cidr={{ service_network_cidr }}
--kubernetes-version {{ kubernetes_version }}
args:
creates: /etc/kubernetes/admin.conf
register: kubeadm_init
when: inventory_hostname == groups['control_plane'][0]
- name: Extract join command for control plane
command: kubeadm token create --print-join-command --certificate-key $(kubeadm init phase upload-certs --upload-certs | tail -1)
register: control_plane_join_command
when: inventory_hostname == groups['control_plane'][0] and groups['control_plane'] | length > 1
- name: Extract join command for workers
command: kubeadm token create --print-join-command
register: worker_join_command
when: inventory_hostname == groups['control_plane'][0]
- name: Configure kubectl for root user
shell: |
mkdir -p /root/.kube
cp -i /etc/kubernetes/admin.conf /root/.kube/config
chown root:root /root/.kube/config
args:
creates: /root/.kube/config
when: inventory_hostname in groups['control_plane']
- name: Join other control plane nodes
command: "{{ hostvars[groups['control_plane'][0]].control_plane_join_command.stdout }} --control-plane"
args:
creates: /etc/kubernetes/kubelet.conf
when: inventory_hostname in groups['control_plane'] and inventory_hostname != groups['control_plane'][0]
- name: Join worker nodes to the cluster
command: "{{ hostvars[groups['control_plane'][0]].worker_join_command.stdout }}"
args:
creates: /etc/kubernetes/kubelet.conf
when: inventory_hostname in groups['workers']YAML7. Networking Setup
---
- name: Deploy Calico CNI
kubernetes.core.k8s:
state: present
src: "https://docs.projectcalico.org/{{ calico_version }}/manifests/calico.yaml"
when: cni_plugin == "calico" and inventory_hostname == groups['control_plane'][0]
- name: Wait for all nodes to be ready
kubernetes.core.k8s_info:
kind: Node
register: nodes
until: nodes.resources | map(attribute='status.conditions') | flatten | selectattr('type', 'equalto', 'Ready') | map(attribute='status') | list | unique == ['True']
retries: 30
delay: 10
when: inventory_hostname == groups['control_plane'][0]YAML8. Ingress Controller Configuration
---
- name: Deploy NGINX Ingress Controller
kubernetes.core.k8s:
state: present
src: "https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-{{ nginx_ingress_version }}/deploy/static/provider/baremetal/deploy.yaml"
when: ingress_controller == "nginx" and inventory_hostname == groups['control_plane'][0]
- name: Configure NodePort to LoadBalancer services for ingress
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
type: LoadBalancer
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
- name: https
port: 443
targetPort: 443
protocol: TCP
selector:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
when: ingress_controller == "nginx" and inventory_hostname == groups['control_plane'][0]YAML9. Dashboard, Monitoring, and Logging
Kubernetes Dashboard
---
- name: Deploy Kubernetes Dashboard
kubernetes.core.k8s:
state: present
src: https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
when: enable_dashboard and inventory_hostname == groups['control_plane'][0]
- name: Create Dashboard Admin User
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
when: enable_dashboard and inventory_hostname == groups['control_plane'][0]
- name: Create ClusterRoleBinding for Dashboard Admin
kubernetes.core.k8s:
state: present
definition:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
when: enable_dashboard and inventory_hostname == groups['control_plane'][0]YAMLMonitoring with Prometheus and Grafana
---
- name: Add Prometheus Helm repository
kubernetes.core.helm_repository:
name: prometheus-community
repo_url: https://prometheus-community.github.io/helm-charts
when: enable_monitoring and inventory_hostname == groups['control_plane'][0]
- name: Create monitoring namespace
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
when: enable_monitoring and inventory_hostname == groups['control_plane'][0]
- name: Deploy Prometheus stack with Grafana
kubernetes.core.helm:
name: kube-prometheus-stack
chart_ref: prometheus-community/kube-prometheus-stack
release_namespace: monitoring
create_namespace: false
values:
grafana:
adminPassword: "{{ grafana_admin_password }}"
service:
type: LoadBalancer
ingress:
enabled: true
hosts:
- grafana.{{ kubernetes_dns_domain }}
prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
serviceMonitorSelector: {}
retention: 7d
resources:
requests:
cpu: 200m
memory: 512Mi
service:
type: ClusterIP
ingress:
enabled: true
hosts:
- prometheus.{{ kubernetes_dns_domain }}
alertmanager:
alertmanagerSpec:
resources:
requests:
cpu: 100m
memory: 128Mi
service:
type: ClusterIP
ingress:
enabled: true
hosts:
- alertmanager.{{ kubernetes_dns_domain }}
when: enable_monitoring and inventory_hostname == groups['control_plane'][0]YAMLLogging with EFK Stack
---
- name: Add Elastic Helm repository
kubernetes.core.helm_repository:
name: elastic
repo_url: https://helm.elastic.co
when: enable_logging and logging_stack == "efk" and inventory_hostname == groups['control_plane'][0]
- name: Create logging namespace
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: logging
when: enable_logging and inventory_hostname == groups['control_plane'][0]
- name: Deploy Elasticsearch
kubernetes.core.helm:
name: elasticsearch
chart_ref: elastic/elasticsearch
release_namespace: logging
create_namespace: false
values:
replicas: 3
minimumMasterNodes: 2
esJavaOpts: "-Xmx512m -Xms512m"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
when: enable_logging and logging_stack == "efk" and inventory_hostname == groups['control_plane'][0]
- name: Deploy Kibana
kubernetes.core.helm:
name: kibana
chart_ref: elastic/kibana
release_namespace: logging
create_namespace: false
values:
ingress:
enabled: true
hosts:
- kibana.{{ kubernetes_dns_domain }}
when: enable_logging and logging_stack == "efk" and inventory_hostname == groups['control_plane'][0]
- name: Deploy Fluentd
kubernetes.core.helm:
name: fluentd
chart_ref: bitnami/fluentd
release_namespace: logging
create_namespace: false
values:
elasticsearch:
host: elasticsearch-master.logging.svc.cluster.local
port: 9200
when: enable_logging and logging_stack == "efk" and inventory_hostname == groups['control_plane'][0]YAML10. GitOps Implementation
---
- name: Create GitOps namespace
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: argocd
when: enable_gitops and gitops_tool == "argocd" and inventory_hostname == groups['control_plane'][0]
- name: Deploy ArgoCD
kubernetes.core.k8s:
state: present
src: "https://raw.githubusercontent.com/argoproj/argo-cd/{{ argocd_version }}/manifests/install.yaml"
when: enable_gitops and gitops_tool == "argocd" and inventory_hostname == groups['control_plane'][0]
- name: Configure ArgoCD Ingress
kubernetes.core.k8s:
state: present
definition:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: argocd-server-ingress
namespace: argocd
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
rules:
- host: argocd.{{ kubernetes_dns_domain }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: argocd-server
port:
number: 443
when: enable_gitops and gitops_tool == "argocd" and inventory_hostname == groups['control_plane'][0]YAML11. Backup and Disaster Recovery
---
- name: Create Backup namespace
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: velero
when: enable_backup and backup_tool == "velero" and inventory_hostname == groups['control_plane'][0]
- name: Deploy Velero credentials
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Secret
metadata:
name: cloud-credentials
namespace: velero
type: Opaque
stringData:
cloud: |
[default]
aws_access_key_id={{ aws_access_key_id }}
aws_secret_access_key={{ aws_secret_access_key }}
when: enable_backup and backup_tool == "velero" and backup_provider == "aws" and inventory_hostname == groups['control_plane'][0]
- name: Add Velero Helm repository
kubernetes.core.helm_repository:
name: vmware-tanzu
repo_url: https://vmware-tanzu.github.io/helm-charts
when: enable_backup and backup_tool == "velero" and inventory_hostname == groups['control_plane'][0]
- name: Deploy Velero
kubernetes.core.helm:
name: velero
chart_ref: vmware-tanzu/velero
release_namespace: velero
create_namespace: false
values:
configuration:
provider: aws
backupStorageLocation:
name: default
bucket: "{{ backup_bucket }}"
config:
region: us-east-1
volumeSnapshotLocation:
name: default
config:
region: us-east-1
credentials:
existingSecret: cloud-credentials
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:{{ velero_version }}
volumeMounts:
- mountPath: /target
name: plugins
schedules:
daily-backup:
schedule: "0 1 * * *"
template:
ttl: 720h # 30 days
when: enable_backup and backup_tool == "velero" and backup_provider == "aws" and inventory_hostname == groups['control_plane'][0]YAML12. CI/CD Integration
---
- name: Create CI/CD namespace
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: ci
when: inventory_hostname == groups['control_plane'][0]
- name: Add Jenkins Helm repository
kubernetes.core.helm_repository:
name: jenkins
repo_url: https://charts.jenkins.io
when: inventory_hostname == groups['control_plane'][0]
- name: Deploy Jenkins
kubernetes.core.helm:
name: jenkins
chart_ref: jenkins/jenkins
release_namespace: ci
create_namespace: false
values:
controller:
adminUser: admin
adminPassword: "{{ jenkins_admin_password }}"
ingress:
enabled: true
apiVersion: networking.k8s.io/v1
hosts:
- jenkins.{{ kubernetes_dns_domain }}
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
agent:
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
when: inventory_hostname == groups['control_plane'][0]YAML13. Secrets Management
---
- name: Create Secrets Management namespace
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: sealed-secrets
when: secrets_management == "sealed-secrets" and inventory_hostname == groups['control_plane'][0]
- name: Deploy Sealed Secrets Controller
kubernetes.core.k8s:
state: present
src: "https://github.com/bitnami-labs/sealed-secrets/releases/download/{{ sealed_secrets_version }}/controller.yaml"
when: secrets_management == "sealed-secrets" and inventory_hostname == groups['control_plane'][0]
- name: Wait for the Sealed Secrets controller to be ready
kubernetes.core.k8s_info:
kind: Deployment
name: sealed-secrets-controller
namespace: sealed-secrets
register: sealed_secrets_deployment
until: sealed_secrets_deployment.resources[0].status.availableReplicas is defined and sealed_secrets_deployment.resources[0].status.availableReplicas == 1
retries: 30
delay: 10
when: secrets_management == "sealed-secrets" and inventory_hostname == groups['control_plane'][0]YAML14. High Availability Setup
---
- name: Install HAProxy and Keepalived
package:
name:
- haproxy
- keepalived
state: present
when: inventory_hostname in groups['lb']
- name: Configure HAProxy
template:
src: haproxy.cfg.j2
dest: /etc/haproxy/haproxy.cfg
owner: root
group: root
mode: '0644'
notify: restart haproxy
when: inventory_hostname in groups['lb']
- name: Configure Keepalived master
template:
src: keepalived_master.conf.j2
dest: /etc/keepalived/keepalived.conf
owner: root
group: root
mode: '0644'
notify: restart keepalived
when: inventory_hostname == groups['lb'][0]
- name: Configure Keepalived backup
template:
src: keepalived_backup.conf.j2
dest: /etc/keepalived/keepalived.conf
owner: root
group: root
mode: '0644'
notify: restart keepalived
when: inventory_hostname in groups['lb'] and inventory_hostname != groups['lb'][0]YAMLHAProxy template:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect {{ haproxy_connect_timeout }}
timeout client {{ haproxy_client_timeout }}
timeout server {{ haproxy_server_timeout }}
frontend kubernetes-apiserver
bind *:{{ kubernetes_api_server_port }}
mode tcp
option tcplog
default_backend kubernetes-apiserver
backend kubernetes-apiserver
mode tcp
option tcp-check
balance roundrobin
{% for host in groups['control_plane'] %}
server {{ host }} {{ hostvars[host].ansible_host }}:{{ kubernetes_api_server_port }} check fall 3 rise 2
{% endfor %}Jinja HTML15. SRE Principles Integration
---
- name: Create SLO namespace
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: slo
when: inventory_hostname == groups['control_plane'][0]
- name: Deploy SLO Operator
kubernetes.core.k8s:
state: present
src: https://github.com/slok/sloth/releases/latest/download/sloth.yaml
when: inventory_hostname == groups['control_plane'][0]
- name: Create SLO for API Server
kubernetes.core.k8s:
state: present
definition:
apiVersion: sloth.slok.dev/v1
kind: PrometheusServiceLevel
metadata:
name: kubernetes-api-sli
namespace: slo
spec:
service: "kubernetes-api"
labels:
app: "kubernetes"
component: "apiserver"
slos:
- name: "availability"
objective: 99.9
description: "API server availability SLO"
sli:
events:
errorQuery: sum(rate(apiserver_request_total{code=~"5.."}[{{.window}}]))
totalQuery: sum(rate(apiserver_request_total[{{.window}}]))
alerting:
pageAlert:
disable: false
labels:
severity: critical
team: platform
annotations:
summary: "High error rate on API server"
ticketAlert:
disable: false
labels:
severity: warning
team: platform
annotations:
summary: "Elevated error rate on API server"
when: inventory_hostname == groups['control_plane'][0]
- name: Deploy Alert Manager Config
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: monitoring
data:
alertmanager.yml: |
global:
resolve_timeout: 5m
slack_api_url: "{{ alertmanager_slack_webhook }}"
route:
group_by: ['job', 'alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: 'slack-notifications'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
send_resolved: true
title: '[{{ "{{" }} .Status {{ "}}" }}] {{ "{{" }} .CommonLabels.alertname {{ "}}" }}'
text: >-
{{ "{{" }} range .Alerts {{ "}}" }}
*Alert:* {{ "{{" }} .Annotations.summary {{ "}}" }}
*Description:* {{ "{{" }} .Annotations.description {{ "}}" }}
*Severity:* {{ "{{" }} .Labels.severity {{ "}}" }}
{{ "{{" }} end {{ "}}" }}
- name: 'pagerduty-critical'
pagerduty_configs:
- service_key: "{{ alertmanager_pagerduty_key }}"
description: '{{ "{{" }} .CommonLabels.alertname {{ "}}" }}'
details:
firing: '{{ "{{" }} .Alerts.Firing {{ "}}" }}'
when: enable_monitoring and inventory_hostname == groups['control_plane'][0]YAML16. Main Playbook
---
- name: Prepare all hosts
hosts: all
become: yes
roles:
- common
- name: Configure Load Balancer
hosts: lb
become: yes
roles:
- ha
tags:
- lb
- ha
- name: Install Container Runtime
hosts: kubernetes
become: yes
roles:
- containerd
tags:
- runtime
- name: Install Kubernetes Components
hosts: kubernetes
become: yes
roles:
- kubernetes
tags:
- kubernetes
- name: Initialize Kubernetes Control Plane
hosts: control_plane
become: yes
tasks:
- include_role:
name: kubernetes
tasks_from: init_control_plane
tags:
- init
- name: Configure Kubernetes Networking
hosts: control_plane[0]
become: yes
roles:
- networking
tags:
- networking
- name: Deploy Kubernetes Addons
hosts: control_plane[0]
become: yes
roles:
- ingress
- dashboard
- monitoring
- logging
- gitops
- backup
- secrets
- cicd
- sre
tags:
- addonsYAML17. Using the Playbook
# Clone the repository (assuming you've set it up in Git)
git clone https://github.com/yourusername/ansible-kubernetes.git
cd ansible-kubernetes
# Update inventory with your actual server details
# Edit inventories/production/hosts.yml and group_vars/all.yml as needed
# Install required collections
ansible-galaxy collection install kubernetes.core community.general
# Run playbook for a complete setup
ansible-playbook -i inventories/production/hosts.yml playbooks/site.yml
# Alternatively, run specific tags
ansible-playbook -i inventories/production/hosts.yml playbooks/site.yml --tags "kubernetes,networking"Bash18. Security Hardening Best Practices
- Use secure TLS settings for all components
- Implement Network Policies to restrict pod-to-pod traffic
- Enable Pod Security Standards (Restricted profile)
- Use least-privilege RBAC configuration
- Regularly scan images for vulnerabilities
- Implement seccomp and AppArmor profiles
- Encrypt etcd data at rest
- Regular certificate rotation
- Limit API server access to trusted networks
- Use admission controllers for policy enforcement
- Implement audit logging and regular review
19. Troubleshooting
Common issues and solutions:
- Node NotReady: Check kubelet logs with
journalctl -u kubelet - Pod scheduling: Check node resources with
kubectl describe node - Network issues: Verify CNI with
kubectl get pods -n kube-system - API server unavailable: Check control plane components with
kubectl get pods -n kube-system - Certificate issues: Check with
kubeadm certs check-expiration
Integrating Terraform with Ansible for Multi-Cloud Kubernetes Deployments
In addition to the Ansible-based deployment approach, we can enhance our Kubernetes deployment by using Terraform to provision the underlying infrastructure across different cloud providers. This section demonstrates how to integrate Terraform with our Ansible playbooks for a complete Infrastructure as Code (IaC) solution.
21. Infrastructure as Code with Terraform
Workflow Architecture
graph TD
A[Terraform] -->|Provisions Infrastructure| B[AWS/Azure/GCP VMs]
A -->|Generates| C[Dynamic Inventory]
C -->|Used by| D[Ansible]
D -->|Configures| E[Kubernetes Cluster]
D -->|Deploys| F[Add-ons & Monitoring]Project Structure with Terraform
ansible-kubernetes/
├── terraform/
│ ├── aws/
│ ├── azure/
│ ├── gcp/
│ └── outputs.tf
├── inventories/
├── roles/
├── playbooks/
└── ...Bash22. Terraform for AWS Infrastructure
AWS Provider Configuration
provider "aws" {
region = var.region
}
resource "aws_vpc" "k8s_vpc" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "kubernetes-vpc"
}
}
resource "aws_subnet" "k8s_subnet" {
vpc_id = aws_vpc.k8s_vpc.id
cidr_block = var.subnet_cidr
map_public_ip_on_launch = true
availability_zone = "${var.region}a"
tags = {
Name = "kubernetes-subnet"
}
}
resource "aws_security_group" "k8s_sg" {
name = "kubernetes-sg"
description = "Allow Kubernetes traffic"
vpc_id = aws_vpc.k8s_vpc.id
# SSH
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Kubernetes API
ingress {
from_port = 6443
to_port = 6443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Allow all internal traffic
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = [var.vpc_cidr]
}
# Allow all outbound traffic
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# Load Balancer for HA Setup
resource "aws_lb" "k8s_api_lb" {
name = "kubernetes-api-lb"
internal = false
load_balancer_type = "network"
subnets = [aws_subnet.k8s_subnet.id]
tags = {
Name = "kubernetes-api-lb"
}
}
resource "aws_lb_target_group" "k8s_api_tg" {
name = "kubernetes-api-tg"
port = 6443
protocol = "TCP"
vpc_id = aws_vpc.k8s_vpc.id
}
resource "aws_lb_listener" "k8s_api" {
load_balancer_arn = aws_lb.k8s_api_lb.arn
port = 6443
protocol = "TCP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.k8s_api_tg.arn
}
}
# Master Nodes
resource "aws_instance" "k8s_master" {
count = var.master_count
ami = var.ubuntu_ami
instance_type = var.master_instance_type
key_name = var.ssh_key_name
subnet_id = aws_subnet.k8s_subnet.id
vpc_security_group_ids = [aws_security_group.k8s_sg.id]
associate_public_ip_address = true
root_block_device {
volume_size = 50
volume_type = "gp2"
}
tags = {
Name = "k8s-master-${count.index + 1}"
Role = "master"
}
}
# Worker Nodes
resource "aws_instance" "k8s_worker" {
count = var.worker_count
ami = var.ubuntu_ami
instance_type = var.worker_instance_type
key_name = var.ssh_key_name
subnet_id = aws_subnet.k8s_subnet.id
vpc_security_group_ids = [aws_security_group.k8s_sg.id]
associate_public_ip_address = true
root_block_device {
volume_size = 100
volume_type = "gp2"
}
tags = {
Name = "k8s-worker-${count.index + 1}"
Role = "worker"
}
}
# Register masters with load balancer
resource "aws_lb_target_group_attachment" "k8s_master_lb_attachment" {
count = var.master_count
target_group_arn = aws_lb_target_group.k8s_api_tg.arn
target_id = aws_instance.k8s_master[count.index].id
port = 6443
}HCLAWS Variables
variable "region" {
description = "AWS region"
default = "us-east-1"
}
variable "vpc_cidr" {
description = "CIDR for the VPC"
default = "10.0.0.0/16"
}
variable "subnet_cidr" {
description = "CIDR for the subnet"
default = "10.0.1.0/24"
}
variable "ubuntu_ami" {
description = "Ubuntu 22.04 AMI"
default = "ami-0557a15b87f6559cf" # Update with appropriate AMI
}
variable "master_count" {
description = "Number of master nodes"
default = 3
}
variable "worker_count" {
description = "Number of worker nodes"
default = 3
}
variable "master_instance_type" {
description = "Instance type for master nodes"
default = "t3.medium"
}
variable "worker_instance_type" {
description = "Instance type for worker nodes"
default = "t3.large"
}
variable "ssh_key_name" {
description = "SSH key name"
default = "kubernetes-key"
}HCLAWS Outputs for Ansible Integration
output "vpc_id" {
value = aws_vpc.k8s_vpc.id
}
output "api_lb_dns_name" {
value = aws_lb.k8s_api_lb.dns_name
}
output "master_ips" {
value = aws_instance.k8s_master[*].public_ip
}
output "master_private_ips" {
value = aws_instance.k8s_master[*].private_ip
}
output "worker_ips" {
value = aws_instance.k8s_worker[*].public_ip
}
output "worker_private_ips" {
value = aws_instance.k8s_worker[*].private_ip
}HCL23. Terraform for Azure Infrastructure
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "k8s_rg" {
name = var.resource_group_name
location = var.location
}
resource "azurerm_virtual_network" "k8s_vnet" {
name = "kubernetes-vnet"
address_space = ["10.0.0.0/16"]
location = azurerm_resource_group.k8s_rg.location
resource_group_name = azurerm_resource_group.k8s_rg.name
}
resource "azurerm_subnet" "k8s_subnet" {
name = "kubernetes-subnet"
resource_group_name = azurerm_resource_group.k8s_rg.name
virtual_network_name = azurerm_virtual_network.k8s_vnet.name
address_prefixes = ["10.0.1.0/24"]
}
resource "azurerm_network_security_group" "k8s_nsg" {
name = "kubernetes-nsg"
location = azurerm_resource_group.k8s_rg.location
resource_group_name = azurerm_resource_group.k8s_rg.name
security_rule {
name = "SSH"
priority = 1001
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "22"
source_address_prefix = "*"
destination_address_prefix = "*"
}
security_rule {
name = "KubernetesAPI"
priority = 1002
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "6443"
source_address_prefix = "*"
destination_address_prefix = "*"
}
}
# Load Balancer for HA Setup
resource "azurerm_public_ip" "k8s_lb_ip" {
name = "kubernetes-lb-ip"
location = azurerm_resource_group.k8s_rg.location
resource_group_name = azurerm_resource_group.k8s_rg.name
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_lb" "k8s_lb" {
name = "kubernetes-lb"
location = azurerm_resource_group.k8s_rg.location
resource_group_name = azurerm_resource_group.k8s_rg.name
sku = "Standard"
frontend_ip_configuration {
name = "PublicIPAddress"
public_ip_address_id = azurerm_public_ip.k8s_lb_ip.id
}
}
resource "azurerm_lb_backend_address_pool" "k8s_backend_pool" {
loadbalancer_id = azurerm_lb.k8s_lb.id
name = "kubernetes-backend-pool"
}
resource "azurerm_lb_rule" "k8s_lb_rule" {
loadbalancer_id = azurerm_lb.k8s_lb.id
name = "kubernetes-api"
protocol = "Tcp"
frontend_port = 6443
backend_port = 6443
frontend_ip_configuration_name = "PublicIPAddress"
backend_address_pool_ids = [azurerm_lb_backend_address_pool.k8s_backend_pool.id]
probe_id = azurerm_lb_probe.k8s_lb_probe.id
}
resource "azurerm_lb_probe" "k8s_lb_probe" {
loadbalancer_id = azurerm_lb.k8s_lb.id
name = "kubernetes-api-probe"
port = 6443
protocol = "Tcp"
}
# Master Nodes
resource "azurerm_network_interface" "k8s_master_nic" {
count = var.master_count
name = "k8s-master-nic-${count.index + 1}"
location = azurerm_resource_group.k8s_rg.location
resource_group_name = azurerm_resource_group.k8s_rg.name
ip_configuration {
name = "internal"
subnet_id = azurerm_subnet.k8s_subnet.id
private_ip_address_allocation = "Dynamic"
public_ip_address_id = azurerm_public_ip.k8s_master_pip[count.index].id
}
}
resource "azurerm_network_interface_security_group_association" "k8s_master_nsg_assoc" {
count = var.master_count
network_interface_id = azurerm_network_interface.k8s_master_nic[count.index].id
network_security_group_id = azurerm_network_security_group.k8s_nsg.id
}
resource "azurerm_public_ip" "k8s_master_pip" {
count = var.master_count
name = "k8s-master-ip-${count.index + 1}"
location = azurerm_resource_group.k8s_rg.location
resource_group_name = azurerm_resource_group.k8s_rg.name
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_linux_virtual_machine" "k8s_master" {
count = var.master_count
name = "k8s-master-${count.index + 1}"
resource_group_name = azurerm_resource_group.k8s_rg.name
location = azurerm_resource_group.k8s_rg.location
size = var.master_vm_size
admin_username = "adminuser"
network_interface_ids = [
azurerm_network_interface.k8s_master_nic[count.index].id,
]
admin_ssh_key {
username = "adminuser"
public_key = file(var.ssh_public_key_path)
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Premium_LRS"
disk_size_gb = 50
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts"
version = "latest"
}
tags = {
Role = "master"
}
}
# Worker Nodes
resource "azurerm_network_interface" "k8s_worker_nic" {
count = var.worker_count
name = "k8s-worker-nic-${count.index + 1}"
location = azurerm_resource_group.k8s_rg.location
resource_group_name = azurerm_resource_group.k8s_rg.name
ip_configuration {
name = "internal"
subnet_id = azurerm_subnet.k8s_subnet.id
private_ip_address_allocation = "Dynamic"
public_ip_address_id = azurerm_public_ip.k8s_worker_pip[count.index].id
}
}
resource "azurerm_network_interface_security_group_association" "k8s_worker_nsg_assoc" {
count = var.worker_count
network_interface_id = azurerm_network_interface.k8s_worker_nic[count.index].id
network_security_group_id = azurerm_network_security_group.k8s_nsg.id
}
resource "azurerm_public_ip" "k8s_worker_pip" {
count = var.worker_count
name = "k8s-worker-ip-${count.index + 1}"
location = azurerm_resource_group.k8s_rg.location
resource_group_name = azurerm_resource_group.k8s_rg.name
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_linux_virtual_machine" "k8s_worker" {
count = var.worker_count
name = "k8s-worker-${count.index + 1}"
resource_group_name = azurerm_resource_group.k8s_rg.name
location = azurerm_resource_group.k8s_rg.location
size = var.worker_vm_size
admin_username = "adminuser"
network_interface_ids = [
azurerm_network_interface.k8s_worker_nic[count.index].id,
]
admin_ssh_key {
username = "adminuser"
public_key = file(var.ssh_public_key_path)
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Premium_LRS"
disk_size_gb = 100
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts"
version = "latest"
}
tags = {
Role = "worker"
}
}HCLAzure Variables and Outputs
variable "resource_group_name" {
description = "Name of the resource group"
default = "kubernetes-rg"
}
variable "location" {
description = "Azure region to deploy resources"
default = "eastus"
}
variable "master_count" {
description = "Number of master nodes"
default = 3
}
variable "worker_count" {
description = "Number of worker nodes"
default = 3
}
variable "master_vm_size" {
description = "Size of master VMs"
default = "Standard_D2s_v3"
}
variable "worker_vm_size" {
description = "Size of worker VMs"
default = "Standard_D4s_v3"
}
variable "ssh_public_key_path" {
description = "Path to the SSH public key"
default = "~/.ssh/id_rsa.pub"
}HCLoutput "resource_group_name" {
value = azurerm_resource_group.k8s_rg.name
}
output "kubernetes_api_ip" {
value = azurerm_public_ip.k8s_lb_ip.ip_address
}
output "master_public_ips" {
value = azurerm_public_ip.k8s_master_pip[*].ip_address
}
output "master_private_ips" {
value = azurerm_linux_virtual_machine.k8s_master[*].private_ip_address
}
output "worker_public_ips" {
value = azurerm_public_ip.k8s_worker_pip[*].ip_address
}
output "worker_private_ips" {
value = azurerm_linux_virtual_machine.k8s_worker[*].private_ip_address
}HCL24. Terraform for GCP Infrastructure
provider "google" {
project = var.project_id
region = var.region
zone = var.zone
}
resource "google_compute_network" "k8s_network" {
name = "kubernetes-network"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "k8s_subnet" {
name = "kubernetes-subnet"
ip_cidr_range = "10.0.0.0/24"
region = var.region
network = google_compute_network.k8s_network.id
}
resource "google_compute_firewall" "k8s_firewall" {
name = "kubernetes-firewall"
network = google_compute_network.k8s_network.name
allow {
protocol = "icmp"
}
allow {
protocol = "tcp"
ports = ["22", "6443", "2379-2380", "10250-10252"]
}
source_ranges = ["0.0.0.0/0"]
}
# Load Balancer for API server
resource "google_compute_address" "k8s_lb_ip" {
name = "kubernetes-lb-ip"
region = var.region
}
resource "google_compute_http_health_check" "k8s_health_check" {
name = "kubernetes-health-check"
port = 6443
request_path = "/healthz"
check_interval_sec = 5
timeout_sec = 5
}
resource "google_compute_target_pool" "k8s_target_pool" {
name = "kubernetes-target-pool"
instances = [for vm in google_compute_instance.k8s_master : "${var.zone}/${vm.name}"]
health_checks = [google_compute_http_health_check.k8s_health_check.name]
session_affinity = "CLIENT_IP"
}
resource "google_compute_forwarding_rule" "k8s_forwarding_rule" {
name = "kubernetes-forwarding-rule"
target = google_compute_target_pool.k8s_target_pool.id
port_range = "6443"
ip_address = google_compute_address.k8s_lb_ip.address
region = var.region
}
# Master Nodes
resource "google_compute_instance" "k8s_master" {
count = var.master_count
name = "k8s-master-${count.index + 1}"
machine_type = var.master_machine_type
zone = var.zone
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-2204-lts"
size = 50
type = "pd-ssd"
}
}
network_interface {
network = google_compute_network.k8s_network.name
subnetwork = google_compute_subnetwork.k8s_subnet.name
access_config {
// Ephemeral IP
}
}
metadata = {
ssh-keys = "${var.ssh_username}:${file(var.ssh_public_key_path)}"
}
tags = ["kubernetes", "master"]
}
# Worker Nodes
resource "google_compute_instance" "k8s_worker" {
count = var.worker_count
name = "k8s-worker-${count.index + 1}"
machine_type = var.worker_machine_type
zone = var.zone
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-2204-lts"
size = 100
type = "pd-ssd"
}
}
network_interface {
network = google_compute_network.k8s_network.name
subnetwork = google_compute_subnetwork.k8s_subnet.name
access_config {
// Ephemeral IP
}
}
metadata = {
ssh-keys = "${var.ssh_username}:${file(var.ssh_public_key_path)}"
}
tags = ["kubernetes", "worker"]
}HCLGCP Variables and Outputs
variable "project_id" {
description = "GCP Project ID"
default = "your-project-id"
}
variable "region" {
description = "GCP region"
default = "us-central1"
}
variable "zone" {
description = "GCP zone"
default = "us-central1-a"
}
variable "master_count" {
description = "Number of master nodes"
default = 3
}
variable "worker_count" {
description = "Number of worker nodes"
default = 3
}
variable "master_machine_type" {
description = "Machine type for master nodes"
default = "e2-standard-2"
}
variable "worker_machine_type" {
description = "Machine type for worker nodes"
default = "e2-standard-4"
}
variable "ssh_username" {
description = "SSH username"
default = "ubuntu"
}
variable "ssh_public_key_path" {
description = "Path to the SSH public key"
default = "~/.ssh/id_rsa.pub"
}HCLoutput "lb_ip_address" {
value = google_compute_address.k8s_lb_ip.address
}
output "master_public_ips" {
value = google_compute_instance.k8s_master[*].network_interface.0.access_config.0.nat_ip
}
output "master_private_ips" {
value = google_compute_instance.k8s_master[*].network_interface.0.network_ip
}
output "worker_public_ips" {
value = google_compute_instance.k8s_worker[*].network_interface.0.access_config.0.nat_ip
}
output "worker_private_ips" {
value = google_compute_instance.k8s_worker[*].network_interface.0.network_ip
}HCL25. Generating Ansible Inventory from Terraform
resource "local_file" "ansible_inventory" {
content = templatefile("${path.module}/templates/inventory.tmpl",
{
master_nodes = zipmap(
[for i in range(var.master_count) : "master${i + 1}"],
[for i in range(var.master_count) : {
ansible_host = aws_instance.k8s_master[i].public_ip
private_ip = aws_instance.k8s_master[i].private_ip
}]
)
worker_nodes = zipmap(
[for i in range(var.worker_count) : "worker${i + 1}"],
[for i in range(var.worker_count) : {
ansible_host = aws_instance.k8s_worker[i].public_ip
private_ip = aws_instance.k8s_worker[i].private_ip
}]
)
lb_endpoint = aws_lb.k8s_api_lb.dns_name
kubernetes_api_port = "6443"
}
)
filename = "${path.module}/../../inventories/aws/hosts.yml"
}HCLInventory template for AWS:
all:
children:
kubernetes:
children:
control_plane:
hosts:
%{ for name, node in master_nodes ~}
${name}:
ansible_host: ${node.ansible_host}
private_ip: ${node.private_ip}
%{ endfor ~}
workers:
hosts:
%{ for name, node in worker_nodes ~}
${name}:
ansible_host: ${node.ansible_host}
private_ip: ${node.private_ip}
%{ endfor ~}
etcd:
children:
control_plane:
vars:
ansible_user: ubuntu
ansible_ssh_private_key_file: ~/.ssh/id_rsa
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
control_plane_endpoint: "${lb_endpoint}:${kubernetes_api_port}"
control_plane_endpoint_noport: "${lb_endpoint}"YAML26. Integrating Terraform with Ansible Workflow
Create a bash script to automate the workflow:
#!/bin/bash
set -e
CLOUD_PROVIDER=${1:-aws}
ACTION=${2:-apply}
echo "Deploying Kubernetes infrastructure on $CLOUD_PROVIDER"
# Initialize and apply Terraform
cd terraform/$CLOUD_PROVIDER
terraform init
terraform $ACTION -auto-approve
# If we're destroying, we're done
if [ "$ACTION" == "destroy" ]; then
echo "Infrastructure destroyed successfully"
exit 0
fi
# Wait for SSH to be available on all hosts
echo "Waiting for SSH to be available..."
sleep 30
# Run Ansible playbook
cd ../..
ansible-playbook -i inventories/$CLOUD_PROVIDER/hosts.yml playbooks/site.yml
echo "Kubernetes cluster deployed successfully on $CLOUD_PROVIDER!"
echo "Run 'kubectl --kubeconfig kubeconfig get nodes' to check your cluster"BashMake the script executable:
chmod +x deploy-k8s.shBash27. Using the Multi-Cloud Deployment
# Deploy on AWS
./deploy-k8s.sh aws apply
# Deploy on Azure
./deploy-k8s.sh azure apply
# Deploy on GCP
./deploy-k8s.sh gcp apply
# Destroy infrastructure when done
./deploy-k8s.sh aws destroyBash28. Best Practices for Multi-Cloud Kubernetes Deployments
- State Management: Store Terraform state files in a remote backend (S3, Azure Blob, GCS)
- Variable Encapsulation: Use Terraform modules for reusable components
- Secrets Management: Never store credentials in your Terraform code; use environment variables or a vault
- Network Consistency: Maintain consistent CIDR ranges across cloud providers
- Version Pinning: Pin provider versions to avoid unexpected changes
- Tagging Strategy: Implement a consistent tagging strategy for all resources
- Cost Monitoring: Set up cost monitoring and alerts for each cloud provider
- Backup Strategy: Ensure your backup strategy works across all cloud environments
- Disaster Recovery: Test disaster recovery procedures regularly
By combining Terraform for infrastructure provisioning with Ansible for configuration management, you have a powerful, flexible approach to deploying Kubernetes across multiple cloud providers while maintaining consistency and reliability.
Discover more from Altgr Blog
Subscribe to get the latest posts sent to your email.
