MongoDB Replication

    From Beginner to Expert

    Table of Contents

    1. Introduction to MongoDB Replication
    2. Understanding Replica Sets
    3. Setting Up Your First Replica Set
    4. Configuration and Management
    5. Read and Write Operations
    6. Elections and Failover
    7. Monitoring and Maintenance
    8. Advanced Topics
    9. Troubleshooting
    10. Best Practices

    1. Introduction to MongoDB Replication

    What is MongoDB Replication?

    MongoDB replication is the process of synchronizing data across multiple servers. It provides redundancy and increases data availability, and with multiple copies of data on different database servers, replication protects a database from the loss of a single server.

    Why Use Replication?

    • High Availability: Automatic failover when primary goes down
    • Data Redundancy: Multiple copies of your data
    • Read Scaling: Distribute read operations across secondaries
    • Data Recovery: Protection against data loss
    • Geographic Distribution: Deploy across multiple data centers

    Replication vs Sharding

    graph TB
        subgraph "Replication"
            A[Application] --> B[Primary]
            B --> C[Secondary 1]
            B --> D[Secondary 2]
            B --> E[Secondary 3]
        end
    
        subgraph "Sharding"
            F[Application] --> G[mongos Router]
            G --> H[Shard 1]
            G --> I[Shard 2]
            G --> J[Shard 3]
        end

    2. Understanding Replica Sets

    What is a Replica Set?

    A replica set is a group of MongoDB processes that maintain the same data set. Replica sets provide redundancy and high availability.

    Replica Set Architecture

    graph TB
        subgraph "Replica Set"
            P[Primary NodeAccepts Writes] 
            S1[Secondary NodeReplicates Data]
            S2[Secondary NodeReplicates Data]
            A[ArbiterVoting Only]
    
            P -->|Oplog| S1
            P -->|Oplog| S2
            P -.->|Heartbeat| S1
            P -.->|Heartbeat| S2
            P -.->|Heartbeat| A
            S1 -.->|Heartbeat| S2
            S1 -.->|Heartbeat| A
            S2 -.->|Heartbeat| A
        end
    
        C[Client] --> P
        C -.->|Read Preference| S1
        C -.->|Read Preference| S2

    Node Types

    Primary Node

    • Receives all write operations
    • Records changes in oplog (operations log)
    • Only one primary per replica set

    Secondary Nodes

    • Maintain copies of primary’s data
    • Apply operations from oplog
    • Can serve read operations (with read preference)
    • Can become primary during elections

    Arbiter Nodes

    • Participate in elections only
    • Do not hold data
    • Lightweight option for odd number of voting members

    Oplog (Operations Log)

    sequenceDiagram
        participant Client
        participant Primary
        participant Secondary1
        participant Secondary2
    
        Client->>Primary: Insert Document
        Primary->>Primary: Write to Collection
        Primary->>Primary: Write to Oplog
        Primary-->>Secondary1: Replicate Oplog Entry
        Primary-->>Secondary2: Replicate Oplog Entry
        Secondary1->>Secondary1: Apply Operation
        Secondary2->>Secondary2: Apply Operation

    3. Setting Up Your First Replica Set

    Prerequisites

    • MongoDB installed on multiple servers
    • Network connectivity between servers
    • Proper firewall configuration (port 27017)

    Step-by-Step Setup

    Step 1: Start MongoDB Instances

    On each server, start MongoDB with replica set configuration:

    # Server 1 (Primary)
    mongod --replSet myReplicaSet --port 27017 --dbpath /data/db1
    
    # Server 2 (Secondary)
    mongod --replSet myReplicaSet --port 27017 --dbpath /data/db2
    
    # Server 3 (Secondary)
    mongod --replSet myReplicaSet --port 27017 --dbpath /data/db3
    Bash

    Step 2: Initialize Replica Set

    Connect to one of the MongoDB instances:

    // Connect to MongoDB
    mongo
    
    // Initialize replica set
    rs.initiate({
      _id: "myReplicaSet",
      members: [
        { _id: 0, host: "server1:27017" },
        { _id: 1, host: "server2:27017" },
        { _id: 2, host: "server3:27017" }
      ]
    })
    JavaScript

    Step 3: Verify Configuration

    // Check replica set status
    rs.status()
    
    // Check configuration
    rs.conf()
    
    // Check if current node is primary
    rs.isMaster()
    JavaScript

    Configuration File Approach

    Create a configuration file for each node:

    # mongod.conf
    systemLog:
      destination: file
      logAppend: true
      path: /var/log/mongodb/mongod.log
    
    storage:
      dbPath: /var/lib/mongo
      journal:
        enabled: true
    
    processManagement:
      fork: true
      pidFilePath: /var/run/mongodb/mongod.pid
    
    net:
      port: 27017
      bindIp: 0.0.0.0
    
    replication:
      replSetName: myReplicaSet
    YAML

    4. Configuration and Management

    Replica Set Configuration Document

    {
      "_id": "myReplicaSet",
      "version": 1,
      "members": [
        {
          "_id": 0,
          "host": "server1:27017",
          "priority": 2,
          "votes": 1
        },
        {
          "_id": 1,
          "host": "server2:27017",
          "priority": 1,
          "votes": 1
        },
        {
          "_id": 2,
          "host": "server3:27017",
          "priority": 1,
          "votes": 1,
          "hidden": false,
          "slaveDelay": 0
        }
      ]
    }
    JSON

    Adding Members

    // Add a new member
    rs.add("server4:27017")
    
    // Add with specific configuration
    rs.add({
      "_id": 3,
      "host": "server4:27017",
      "priority": 0.5,
      "votes": 1
    })
    JavaScript

    Removing Members

    // Remove a member
    rs.remove("server4:27017")
    JavaScript

    Member Configuration Options

    Priority

    • Determines likelihood of becoming primary (0-1000)
    • Priority 0 = never becomes primary
    • Higher priority = more likely to be elected

    Votes

    • Determines voting in elections (0 or 1)
    • Maximum 7 voting members per replica set
    • Non-voting members can hold data

    Hidden Members

    • Not visible to application
    • Cannot become primary
    • Good for backups and reporting
    // Configure hidden member
    cfg = rs.conf()
    cfg.members[2].hidden = true
    cfg.members[2].priority = 0
    rs.reconfig(cfg)
    JavaScript

    Delayed Members

    • Maintain historical snapshot of data
    • Useful for protection against human error
    // Configure delayed member (1 hour delay)
    cfg = rs.conf()
    cfg.members[2].slaveDelay = 3600
    cfg.members[2].priority = 0
    cfg.members[2].hidden = true
    rs.reconfig(cfg)
    JavaScript

    Topology Examples

    Standard Three-Node Setup

    graph TB
        subgraph "Standard Replica Set"
            P[PrimaryPriority: 1Votes: 1]
            S1[SecondaryPriority: 1Votes: 1]
            S2[SecondaryPriority: 1Votes: 1]
    
            P --> S1
            P --> S2
        end

    Five-Node with Arbiter

    graph TB
        subgraph "Extended Replica Set"
            P[PrimaryPriority: 2Votes: 1]
            S1[SecondaryPriority: 1Votes: 1]
            S2[SecondaryPriority: 1Votes: 1]
            S3[SecondaryPriority: 0Votes: 1Hidden]
            A[ArbiterVotes: 1]
    
            P --> S1
            P --> S2
            P --> S3
            P -.-> A
        end

    5. Read and Write Operations

    Write Operations

    All write operations go to the primary:

    // Write operations always go to primary
    db.users.insertOne({name: "John", email: "john@example.com"})
    db.users.updateOne({name: "John"}, {$set: {age: 30}})
    db.users.deleteOne({name: "John"})
    JavaScript

    Write Concerns

    Control acknowledgment of write operations:

    graph LR
        subgraph "Write Concern Levels"
            A[w: 1Primary Only] 
            B[w: 2Primary + 1 Secondary]
            C[w: majorityMajority of Nodes]
            D[w: 0No Acknowledgment]
        end
    // Write concern examples
    db.users.insertOne(
      {name: "Alice"}, 
      {writeConcern: {w: "majority", wtimeout: 5000}}
    )
    
    db.users.insertOne(
      {name: "Bob"}, 
      {writeConcern: {w: 2, j: true, wtimeout: 3000}}
    )
    JavaScript

    Read Operations and Read Preferences

    Read Preference Modes

    graph TB
        subgraph "Read Preferences"
            A[primaryDefault - Primary Only]
            B[primaryPreferredPrimary, then Secondary]
            C[secondarySecondary Only]
            D[secondaryPreferredSecondary, then Primary]
            E[nearestLowest Network Latency]
        end
    // Set read preference
    db.users.find().readPref("secondary")
    db.users.find().readPref("primaryPreferred")
    db.users.find().readPref("nearest", [{datacenter: "east"}])
    
    // With MongoDB driver
    const collection = db.collection('users');
    const result = await collection.find({}, {
      readPreference: 'secondaryPreferred'
    }).toArray();
    JavaScript

    Read Concern

    Control consistency and isolation properties:

    // Read concern levels
    db.users.find().readConcern("local")      // Default
    db.users.find().readConcern("available")  // No guarantee
    db.users.find().readConcern("majority")   // Majority committed
    db.users.find().readConcern("linearizable") // Linearizable reads
    JavaScript

    Connection Strings

    // Connection string with read preference
    mongodb://server1:27017,server2:27017,server3:27017/mydb?replicaSet=myReplicaSet&readPreference=secondaryPreferred
    
    // With write and read concerns
    mongodb://server1:27017,server2:27017,server3:27017/mydb?replicaSet=myReplicaSet&w=majority&readConcernLevel=majority
    JavaScript

    6. Elections and Failover

    Election Process

    sequenceDiagram
        participant P as Primary
        participant S1 as Secondary1
        participant S2 as Secondary2
        participant S3 as Secondary3
    
        Note over P,S3: Normal Operation
        P->>S1: Heartbeat
        P->>S2: Heartbeat
        P->>S3: Heartbeat
    
        Note over P,S3: Primary Fails
        P--xS1: No Heartbeat
        P--xS2: No Heartbeat
        P--xS3: No Heartbeat
    
        Note over S1,S3: Election Starts
        S1->>S2: Vote Request
        S1->>S3: Vote Request
        S2->>S1: Vote Response
        S3->>S1: Vote Response
    
        Note over S1,S3: S1 Becomes Primary
        S1->>S2: I am Primary
        S1->>S3: I am Primary

    Election Triggers

    • Primary becomes unreachable
    • Primary steps down voluntarily
    • Network partition
    • Configuration changes

    Election Factors

    Priority

    • Higher priority nodes more likely to be elected
    • Priority 0 nodes cannot become primary

    Oplog Position

    • Nodes with more recent data preferred
    • Prevents data loss during election

    Connectivity

    • Node must be able to reach majority of voting members

    Failover Timeline

    gantt
        title Replica Set Failover Timeline
        dateFormat YYYY-MM-DD
        axisFormat %s
    
        section Detection
        Heartbeat Timeout    :a1, 2023-01-01, 10s
    
        section Election
        Vote Request         :a2, after a1, 15s
        Vote Collection      :a3, after a2, 10s
        Primary Declaration  :a4, after a3, 5s
    
        section Recovery
        Catch-up Period      :a5, after a4, 15s
        Normal Operation     :a6, after a5, 15s
    

    Managing Elections

    // Force an election (step down primary)
    rs.stepDown(60) // Step down for 60 seconds
    
    // Check election metrics
    rs.status().members.forEach(function(member) {
      print(member.name + ": " + member.state)
    })
    
    // Freeze a node (prevent it from becoming primary)
    rs.freeze(120) // Freeze for 120 seconds
    JavaScript

    Split-Brain Prevention

    MongoDB prevents split-brain scenarios through majority voting:

    graph TB
        subgraph "Network Partition Scenario"
            subgraph "Partition A"
                P[Primary]
                S1[Secondary]
            end
    
            subgraph "Partition B"
                S2[Secondary]
                S3[Secondary]
                A[Arbiter]
            end
    
            P -.->|Network Down| S2
        end
    
        subgraph "Result"
            P2[Primary Steps DownNo Majority]
            S22[New Primary ElectedHas Majority]
        end

    7. Monitoring and Maintenance

    Basic Monitoring Commands

    // Replica set status
    rs.status()
    
    // Replication lag
    db.printReplicationInfo()
    db.printSlaveReplicationInfo()
    
    // Current operations
    db.currentOp()
    
    // Server status
    db.serverStatus().repl
    JavaScript

    Key Metrics to Monitor

    Replication Lag

    // Check replication lag
    db.runCommand({replSetGetStatus: 1}).members.forEach(function(member) {
      if (member.state === 2) { // Secondary
        print(member.name + " lag: " + 
              (new Date() - member.optimeDate) / 1000 + " seconds")
      }
    })
    JavaScript

    Oplog Size and Utilization

    // Check oplog stats
    db.oplog.rs.stats()
    
    // Oplog size in GB
    db.oplog.rs.stats().maxSize / (1024*1024*1024)
    
    // Time range covered by oplog
    db.printReplicationInfo()
    JavaScript

    Monitoring Dashboard Structure

    graph TB
        subgraph "MongoDB Monitoring Dashboard"
            A[Replica Set Health]
            B[Replication Lag]
            C[Oplog Utilization]
            D[Election Events]
            E[Connection Pool]
            F[Read/Write Distribution]
    
            A --> A1[Primary Status]
            A --> A2[Secondary Count]
            A --> A3[Heartbeat Status]
    
            B --> B1[Max Lag Time]
            B --> B2[Lag by Node]
            B --> B3[Lag Trends]
    
            C --> C1[Oplog Size]
            C --> C2[Oplog Window]
            C --> C3[Growth Rate]
        end

    Setting Up Monitoring

    Using MongoDB Ops Manager

    // Enable profiling for monitoring
    db.setProfilingLevel(1, {slowms: 100})
    
    // Monitor specific operations
    db.system.profile.find().limit(5).sort({ts: -1}).pretty()
    JavaScript

    Custom Monitoring Script

    // monitoring.js
    function checkReplicaSetHealth() {
      const status = rs.status()
      const health = {
        setName: status.set,
        primary: null,
        secondaries: [],
        arbiters: [],
        maxLag: 0
      }
    
      status.members.forEach(function(member) {
        if (member.state === 1) {
          health.primary = member.name
        } else if (member.state === 2) {
          health.secondaries.push({
            name: member.name,
            lag: (new Date() - member.optimeDate) / 1000
          })
          health.maxLag = Math.max(health.maxLag, 
                                   (new Date() - member.optimeDate) / 1000)
        } else if (member.state === 7) {
          health.arbiters.push(member.name)
        }
      })
    
      return health
    }
    
    // Run monitoring
    const health = checkReplicaSetHealth()
    print(JSON.stringify(health, null, 2))
    JavaScript

    Log Analysis

    Important Log Patterns

    # Election events
    grep "election" /var/log/mongodb/mongod.log
    
    # Replication lag warnings
    grep "replication lag" /var/log/mongodb/mongod.log
    
    # Connection issues
    grep "connection" /var/log/mongodb/mongod.log
    
    # Oplog issues
    grep "oplog" /var/log/mongodb/mongod.log
    Bash

    8. Advanced Topics

    Chained Replication

    graph TB
        subgraph "Chained Replication"
            P[PrimaryData Center A]
            S1[Secondary 1Data Center A]
            S2[Secondary 2Data Center B]
            S3[Secondary 3Data Center C]
    
            P --> S1
            S1 --> S2
            S2 --> S3
    
            style P fill:#e1f5fe
            style S1 fill:#f3e5f5
            style S2 fill:#f3e5f5
            style S3 fill:#f3e5f5
        end

    Configuring Chained Replication

    // Allow chaining (default: true)
    cfg = rs.conf()
    cfg.settings = cfg.settings || {}
    cfg.settings.chainingAllowed = true
    rs.reconfig(cfg)
    JavaScript

    Multi-Data Center Deployment

    graph TB
        subgraph "Multi-DC Replica Set"
            subgraph "DC1 - Primary"
                P[PrimaryPriority: 2]
                S1[SecondaryPriority: 1]
            end
    
            subgraph "DC2 - Secondary"
                S2[SecondaryPriority: 1]
                S3[SecondaryPriority: 1]
            end
    
            subgraph "DC3 - Arbiter"
                A[ArbiterTie Breaker]
            end
    
            P --> S1
            P -.-> S2
            P -.-> S3
            P -.-> A
        end

    Configuration for Multi-DC

    rs.initiate({
      _id: "multiDCSet",
      members: [
        { _id: 0, host: "dc1-server1:27017", priority: 2 },
        { _id: 1, host: "dc1-server2:27017", priority: 1 },
        { _id: 2, host: "dc2-server1:27017", priority: 1 },
        { _id: 3, host: "dc2-server2:27017", priority: 1 },
        { _id: 4, host: "dc3-arbiter:27017", arbiterOnly: true }
      ]
    })
    JavaScript

    Tag-Based Read Preferences

    // Configure tags
    cfg = rs.conf()
    cfg.members[0].tags = {datacenter: "east", rack: "1"}
    cfg.members[1].tags = {datacenter: "east", rack: "2"}
    cfg.members[2].tags = {datacenter: "west", rack: "1"}
    rs.reconfig(cfg)
    
    // Use tagged read preference
    db.users.find().readPref("nearest", [{datacenter: "east"}])
    db.users.find().readPref("secondary", [{rack: "1"}])
    JavaScript

    Write Concern with Tags

    // Configure tag-based write concern
    cfg = rs.conf()
    cfg.settings = {
      getLastErrorModes: {
        multiDC: {datacenter: 2},
        allRacks: {rack: 3}
      }
    }
    rs.reconfig(cfg)
    
    // Use tagged write concern
    db.users.insertOne(
      {name: "Critical Data"}, 
      {writeConcern: {w: "multiDC", wtimeout: 5000}}
    )
    JavaScript

    Replica Set Maintenance

    Rolling Maintenance

    sequenceDiagram
        participant P as Primary
        participant S1 as Secondary1
        participant S2 as Secondary2
    
        Note over P,S2: Step 1: Maintain Secondary1
        P->>S1: Shutdown for maintenance
        S1-->>P: Offline
        P->>S2: Continue replication
    
        Note over P,S2: Step 2: S1 Back Online
        S1->>P: Reconnect and sync
        P->>S1: Catch up replication
    
        Note over P,S2: Step 3: Maintain Secondary2
        P->>S2: Shutdown for maintenance
        S2-->>P: Offline
        P->>S1: Continue replication
    
        Note over P,S2: Step 4: Step Down Primary
        P->>S1: Step down
        S1->>P: Become Primary
        P->>P: Maintenance mode

    Maintenance Procedures

    // 1. Perform maintenance on secondaries first
    rs.status() // Identify secondaries
    
    // 2. For each secondary:
    // - Stop MongoDB process
    // - Perform maintenance (OS updates, hardware, etc.)
    // - Restart MongoDB
    // - Wait for catch-up
    
    // 3. Step down primary
    rs.stepDown(60)
    
    // 4. Perform maintenance on former primary
    // 5. Restart and let it rejoin as secondary
    JavaScript

    Backup Strategies

    Backup from Secondary

    # Create backup from secondary to avoid impacting primary
    mongodump --host secondary1:27017 --oplog --out /backup/mongodb/$(date +%Y%m%d)
    
    # Point-in-time backup
    mongodump --host secondary1:27017 --oplog --query '{"timestamp": {"$lt": {"$timestamp": {"t": 1609459200, "i": 1}}}}'
    Bash

    Delayed Member for Backup

    // Configure delayed member for backup protection
    cfg = rs.conf()
    cfg.members[3] = {
      _id: 3,
      host: "backup-server:27017",
      priority: 0,
      hidden: true,
      slaveDelay: 7200, // 2 hours delay
      votes: 0
    }
    rs.reconfig(cfg)
    JavaScript

    9. Troubleshooting

    Common Issues and Solutions

    1. Replication Lag

    Symptoms:

    • High lag reported in rs.status()
    • Delayed data on secondaries

    Diagnosis:

    // Check replication lag
    db.printSlaveReplicationInfo()
    
    // Check oplog window
    db.printReplicationInfo()
    
    // Monitor oplog growth
    db.oplog.rs.find().sort({$natural: -1}).limit(1)
    JavaScript

    Solutions:

    graph TB
        A[Replication Lag Detected] --> B{Check Network}
        B -->|Network OK| C{Check Secondary Load}
        B -->|Network Issues| D[Fix Network Connectivity]
        C -->|High Load| E[Scale Secondary Resources]
        C -->|Load OK| F{Check Oplog Size}
        F -->|Too Small| G[Increase Oplog Size]
        F -->|Size OK| H[Check Write Patterns]

    2. Election Issues

    Symptoms:

    • Frequent elections
    • No primary elected

    Diagnosis:

    // Check election stats
    rs.status().members.forEach(function(m) {
      if (m.electionTime) {
        print(m.name + " last election: " + m.electionTime)
      }
    })
    
    // Check voting configuration
    rs.conf().members.forEach(function(m) {
      print(m.host + " votes: " + m.votes + " priority: " + m.priority)
    })
    JavaScript

    Common Solutions:

    // Fix: Network partition
    // Ensure majority of nodes can communicate
    
    // Fix: Clock skew
    // Synchronize clocks across all nodes
    
    // Fix: Priority misconfiguration
    cfg = rs.conf()
    cfg.members[0].priority = 2  // Give preference to specific node
    rs.reconfig(cfg)
    JavaScript

    3. Oplog Issues

    Problem: Oplog Too Small

    // Check current oplog size
    db.oplog.rs.stats().maxSize / (1024*1024*1024) // Size in GB
    
    // Resize oplog (MongoDB 3.6+)
    db.adminCommand({replSetResizeOplog: 1, size: 10240}) // 10GB
    JavaScript

    Problem: Oplog Overflow

    // Monitor oplog utilization
    function checkOplogUtilization() {
      const stats = db.oplog.rs.stats()
      const oldest = db.oplog.rs.find().sort({$natural: 1}).limit(1).next()
      const newest = db.oplog.rs.find().sort({$natural: -1}).limit(1).next()
    
      const window = newest.ts.getTime() - oldest.ts.getTime()
      const hours = window / (1000 * 60 * 60)
    
      print("Oplog window: " + hours + " hours")
      print("Oplog size: " + (stats.maxSize / 1024 / 1024 / 1024) + " GB")
    }
    JavaScript

    4. Connection Issues

    Diagnosis:

    // Check current connections
    db.serverStatus().connections
    
    // Monitor connection pool
    db.runCommand({connPoolStats: 1})
    
    // Check for connection errors in logs
    JavaScript

    Solutions:

    # Increase connection limits in mongod.conf
    net:
      maxIncomingConnections: 20000
    
    # Connection string optimization
    mongodb://server1:27017,server2:27017,server3:27017/mydb?
      replicaSet=myReplicaSet&
      maxPoolSize=100&
      minPoolSize=10&
      maxIdleTimeMS=30000
    INI

    Diagnostic Commands Reference

    // Comprehensive health check
    function healthCheck() {
      print("=== Replica Set Health Check ===")
    
      // Basic status
      print("\n1. Replica Set Status:")
      const status = rs.status()
      print("Set: " + status.set)
      print("Primary: " + status.members.find(m => m.state === 1)?.name || "None")
    
      // Member states
      print("\n2. Member States:")
      status.members.forEach(m => {
        print(m.name + ": " + m.stateStr + " (lag: " + 
              ((new Date() - m.optimeDate) / 1000) + "s)")
      })
    
      // Oplog info
      print("\n3. Oplog Information:")
      db.printReplicationInfo()
    
      // Connection status
      print("\n4. Connections:")
      const connStats = db.serverStatus().connections
      print("Current: " + connStats.current + "/" + connStats.available)
    
      return status
    }
    
    // Run health check
    healthCheck()
    JavaScript

    Recovery Procedures

    Recovering from Data Corruption

    graph TB
        A[Detect Corruption] --> B[Isolate Affected Node]
        B --> C[Stop MongoDB Process]
        C --> D{Data Recoverable?}
        D -->|Yes| E[Repair Database]
        D -->|No| F[Remove from Replica Set]
        E --> G[Restart and Resync]
        F --> H[Fresh Install]
        G --> I[Monitor Health]
        H --> I
    // Remove corrupted member
    rs.remove("corrupted-server:27017")
    
    // After fixing, re-add
    rs.add("fixed-server:27017")
    JavaScript

    Initial Sync Issues

    // Force resync of a member
    // 1. Stop MongoDB on the problematic secondary
    // 2. Remove data directory
    // 3. Restart MongoDB - it will perform initial sync
    
    // Monitor initial sync progress
    db.serverStatus().initialSync
    JavaScript

    10. Best Practices

    Deployment Best Practices

    Hardware Recommendations

    graph TB
        subgraph "Production Deployment"
            subgraph "Primary Node"
                P1[High-performance SSD]
                P2[Adequate RAMWorking Set + OS]
                P3[Fast Network]
                P4[Redundant Power]
            end
    
            subgraph "Secondary Nodes"
                S1[Similar Hardware to Primary]
                S2[Geographically Distributed]
                S3[Dedicated Networks]
            end
    
            subgraph "Monitoring"
                M1[Centralized Logging]
                M2[Metrics Collection]
                M3[Alerting System]
            end
        end

    Network Configuration

    # Firewall rules (example for iptables)
    # Allow MongoDB port between replica set members
    iptables -A INPUT -p tcp -s 10.0.1.0/24 --dport 27017 -j ACCEPT
    
    # Security group (AWS example)
    # Source: Security group ID of replica set members
    # Port: 27017
    # Protocol: TCP
    Bash

    Operating System Tuning

    # Disable Transparent Huge Pages
    echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
    echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
    
    # Set appropriate ulimits
    echo "mongodb soft nofile 64000" >> /etc/security/limits.conf
    echo "mongodb hard nofile 64000" >> /etc/security/limits.conf
    echo "mongodb soft nproc 32000" >> /etc/security/limits.conf
    echo "mongodb hard nproc 32000" >> /etc/security/limits.conf
    
    # Configure swappiness
    echo 'vm.swappiness = 1' >> /etc/sysctl.conf
    Bash

    Security Best Practices

    Authentication and Authorization

    // Create admin user
    use admin
    db.createUser({
      user: "admin",
      pwd: "securePassword",
      roles: ["userAdminAnyDatabase", "dbAdminAnyDatabase", "readWriteAnyDatabase"]
    })
    
    // Create replica set user
    db.createUser({
      user: "replicaSetUser",
      pwd: "replicaPassword",
      roles: ["clusterAdmin"]
    })
    JavaScript

    Enable Authentication

    # mongod.conf
    security:
      authorization: enabled
      keyFile: /etc/mongodb-keyfile
    
    # Create keyfile
    openssl rand -base64 756 > /etc/mongodb-keyfile
    chmod 400 /etc/mongodb-keyfile
    chown mongodb:mongodb /etc/mongodb-keyfile
    Bash

    SSL/TLS Configuration

    # mongod.conf
    net:
      ssl:
        mode: requireSSL
        PEMKeyFile: /etc/ssl/mongodb.pem
        CAFile: /etc/ssl/ca.pem
        allowConnectionsWithoutCertificates: false
    INI

    Performance Best Practices

    Write Concern Strategy

    graph LR
        subgraph "Write Concern Selection"
            A[Application Type] --> B{Consistency Needs}
            B -->|High| C[w: majority]
            B -->|Medium| D[w: 2]
            B -->|Low| E[w: 1]
    
            F[Performance Needs] --> G{Latency Tolerance}
            G -->|Low| H[w: 1, j: false]
            G -->|Medium| I[w: majority, j: true]
            G -->|High| J[w: all, j: true]
        end

    Read Preference Strategy

    // Application patterns
    const strategies = {
      // Real-time dashboard - need latest data
      realTime: { readPreference: 'primary' },
    
      // Analytics - can tolerate slight lag
      analytics: { readPreference: 'secondaryPreferred' },
    
      // Reports - distribute load
      reports: { readPreference: 'secondary' },
    
      // Global app - use nearest
      global: { readPreference: 'nearest' }
    }
    JavaScript

    Index Strategy for Replica Sets

    // Create indexes on primary - automatically replicated
    db.users.createIndex({email: 1}, {unique: true})
    db.orders.createIndex({customerId: 1, orderDate: -1})
    
    // Background index creation (less blocking)
    db.products.createIndex({category: 1, price: -1}, {background: true})
    
    // Partial indexes for efficiency
    db.users.createIndex(
      {email: 1}, 
      {partialFilterExpression: {email: {$exists: true}}}
    )
    JavaScript

    Monitoring and Alerting

    Key Metrics to Monitor

    // Monitoring script template
    const monitoringChecks = {
      replicationLag: function() {
        const status = rs.status()
        const primary = status.members.find(m => m.state === 1)
        const maxLag = Math.max(...status.members
          .filter(m => m.state === 2)
          .map(m => (new Date() - m.optimeDate) / 1000))
    
        return {
          metric: 'replication_lag_seconds',
          value: maxLag,
          threshold: 30, // Alert if > 30 seconds
          status: maxLag > 30 ? 'CRITICAL' : 'OK'
        }
      },
    
      oplogWindow: function() {
        const stats = db.oplog.rs.stats()
        const oldest = db.oplog.rs.find().sort({$natural: 1}).limit(1).next()
        const newest = db.oplog.rs.find().sort({$natural: -1}).limit(1).next()
        const hours = (newest.ts.getTime() - oldest.ts.getTime()) / (1000 * 60 * 60)
    
        return {
          metric: 'oplog_window_hours',
          value: hours,
          threshold: 24, // Alert if < 24 hours
          status: hours < 24 ? 'WARNING' : 'OK'
        }
      },
    
      primaryStatus: function() {
        const status = rs.status()
        const hasPrimary = status.members.some(m => m.state === 1)
    
        return {
          metric: 'has_primary',
          value: hasPrimary ? 1 : 0,
          threshold: 1,
          status: hasPrimary ? 'OK' : 'CRITICAL'
        }
      }
    }
    JavaScript

    Alerting Rules

    # Example Prometheus alerting rules
    groups:
      - name: mongodb.rules
        rules:
          - alert: MongoDBReplicationLag
            expr: mongodb_replication_lag_seconds > 30
            for: 5m
            labels:
              severity: critical
            annotations:
              summary: "MongoDB replication lag is high"
    
          - alert: MongoDBNoPrimary
            expr: mongodb_replica_set_primary_count == 0
            for: 1m
            labels:
              severity: critical
            annotations:
              summary: "MongoDB replica set has no primary"
    
          - alert: MongoDBOplogWindow
            expr: mongodb_oplog_window_hours < 24
            for: 10m
            labels:
              severity: warning
            annotations:
              summary: "MongoDB oplog window is getting small"
    YAML

    Capacity Planning

    Growth Estimation

    // Capacity planning script
    function capacityPlanning() {
      const stats = db.stats()
      const collections = db.runCommand("listCollections").cursor.firstBatch
    
      const analysis = {
        currentSize: stats.dataSize / (1024*1024*1024), // GB
        indexSize: stats.indexSize / (1024*1024*1024),  // GB
        avgDocSize: stats.avgObjSize,
        collections: collections.length
      }
    
      // Project growth (example: 20% monthly)
      const monthlyGrowth = 1.20
      const months = 12
    
      analysis.projectedSize = analysis.currentSize * Math.pow(monthlyGrowth, months)
      analysis.recommendedStorage = analysis.projectedSize * 2 // 100% buffer
    
      return analysis
    }
    JavaScript

    Resource Scaling Guidelines

    graph TB
        subgraph "Scaling Decision Tree"
            A[Performance Issues?] --> B{CPU Bound?}
            A --> C{Memory Bound?}
            A --> D{Disk I/O Bound?}
            A --> E{Network Bound?}
    
            B -->|Yes| F[Scale CPU Verticallyor Add Read Replicas]
            C -->|Yes| G[Add RAM orOptimize Queries]
            D -->|Yes| H[Upgrade to SSD orAdd More Secondaries]
            E -->|Yes| I[Upgrade Network orOptimize Connection Pool]
        end

    Disaster Recovery

    Backup Strategy

    graph TB
        subgraph "Backup Strategy"
            A[Daily Full Backup] --> B[Continuous Oplog Backup]
            B --> C[Point-in-Time Recovery]
    
            D[Geographic Distribution] --> E[Cross-Region Replication]
            E --> F[Disaster Recovery Site]
    
            G[Testing] --> H[Monthly Restore Tests]
            H --> I[Documented Procedures]
        end

    Recovery Procedures

    // Disaster recovery runbook
    const recoveryProcedures = {
      totalLoss: [
        "1. Restore from latest backup",
        "2. Replay oplog entries",
        "3. Validate data integrity",
        "4. Rebuild replica set",
        "5. Update application connection strings"
      ],
    
      primaryLoss: [
        "1. Verify secondary promotion",
        "2. Update application if needed",
        "3. Rebuild failed primary",
        "4. Add back to replica set"
      ],
    
      majorityLoss: [
        "1. Restore from backup to new servers",
        "2. Reconfigure replica set",
        "3. Force reconfiguration if needed",
        "4. Validate application connectivity"
      ]
    }
    JavaScript

    Conclusion

    MongoDB replication provides robust high availability and data protection through replica sets. Key takeaways:

    1. Always deploy in odd numbers (3, 5, 7 members) to ensure clear majorities
    2. Monitor replication lag and oplog window continuously
    3. Use appropriate write and read concerns for your consistency needs
    4. Plan for failure scenarios and practice recovery procedures
    5. Implement comprehensive monitoring and alerting
    6. Follow security best practices including authentication and encryption
    7. Regular maintenance and capacity planning are essential

    By following these practices and understanding the concepts in this guide, you’ll be able to successfully deploy and manage MongoDB replica sets from development through enterprise production environments.

    Additional Resources


    This guide covers MongoDB replication comprehensively. For the latest features and updates, always refer to the official MongoDB documentation.


    Discover more from Altgr Blog

    Subscribe to get the latest posts sent to your email.

    Leave a Reply

    Your email address will not be published. Required fields are marked *