MongoDB Replication – Altgr Blog

From Beginner to Expert

Introduction to MongoDB Replication
Understanding Replica Sets
Setting Up Your First Replica Set
Configuration and Management
Read and Write Operations
Elections and Failover
Monitoring and Maintenance
Advanced Topics
Troubleshooting
Best Practices

1. Introduction to MongoDB Replication

What is MongoDB Replication?

MongoDB replication is the process of synchronizing data across multiple servers. It provides redundancy and increases data availability, and with multiple copies of data on different database servers, replication protects a database from the loss of a single server.

Why Use Replication?

High Availability: Automatic failover when primary goes down
Data Redundancy: Multiple copies of your data
Read Scaling: Distribute read operations across secondaries
Data Recovery: Protection against data loss
Geographic Distribution: Deploy across multiple data centers

Replication vs Sharding

graph TB
    subgraph "Replication"
        A[Application] --> B[Primary]
        B --> C[Secondary 1]
        B --> D[Secondary 2]
        B --> E[Secondary 3]
    end

    subgraph "Sharding"
        F[Application] --> G[mongos Router]
        G --> H[Shard 1]
        G --> I[Shard 2]
        G --> J[Shard 3]
    end

2. Understanding Replica Sets

What is a Replica Set?

A replica set is a group of MongoDB processes that maintain the same data set. Replica sets provide redundancy and high availability.

Replica Set Architecture

graph TB
    subgraph "Replica Set"
        P[Primary NodeAccepts Writes] 
        S1[Secondary NodeReplicates Data]
        S2[Secondary NodeReplicates Data]
        A[ArbiterVoting Only]

        P -->|Oplog| S1
        P -->|Oplog| S2
        P -.->|Heartbeat| S1
        P -.->|Heartbeat| S2
        P -.->|Heartbeat| A
        S1 -.->|Heartbeat| S2
        S1 -.->|Heartbeat| A
        S2 -.->|Heartbeat| A
    end

    C[Client] --> P
    C -.->|Read Preference| S1
    C -.->|Read Preference| S2

Node Types

Primary Node

Receives all write operations
Records changes in oplog (operations log)
Only one primary per replica set

Secondary Nodes

Maintain copies of primary’s data
Apply operations from oplog
Can serve read operations (with read preference)
Can become primary during elections

Arbiter Nodes

Participate in elections only
Do not hold data
Lightweight option for odd number of voting members

Oplog (Operations Log)

sequenceDiagram
    participant Client
    participant Primary
    participant Secondary1
    participant Secondary2

    Client->>Primary: Insert Document
    Primary->>Primary: Write to Collection
    Primary->>Primary: Write to Oplog
    Primary-->>Secondary1: Replicate Oplog Entry
    Primary-->>Secondary2: Replicate Oplog Entry
    Secondary1->>Secondary1: Apply Operation
    Secondary2->>Secondary2: Apply Operation

3. Setting Up Your First Replica Set

Prerequisites

MongoDB installed on multiple servers
Network connectivity between servers
Proper firewall configuration (port 27017)

Step-by-Step Setup

Step 1: Start MongoDB Instances

On each server, start MongoDB with replica set configuration:

# Server 1 (Primary)
mongod --replSet myReplicaSet --port 27017 --dbpath /data/db1

# Server 2 (Secondary)
mongod --replSet myReplicaSet --port 27017 --dbpath /data/db2

# Server 3 (Secondary)
mongod --replSet myReplicaSet --port 27017 --dbpath /data/db3

# Server 1 (Primary)
mongod --replSet myReplicaSet --port 27017 --dbpath /data/db1

# Server 2 (Secondary)
mongod --replSet myReplicaSet --port 27017 --dbpath /data/db2

# Server 3 (Secondary)
mongod --replSet myReplicaSet --port 27017 --dbpath /data/db3

Bash

Step 2: Initialize Replica Set

Connect to one of the MongoDB instances:

// Connect to MongoDB
mongo

// Initialize replica set
rs.initiate({
  _id: "myReplicaSet",
  members: [
    { _id: 0, host: "server1:27017" },
    { _id: 1, host: "server2:27017" },
    { _id: 2, host: "server3:27017" }
  ]
})

// Connect to MongoDB
mongo

// Initialize replica set
rs.initiate({
  _id: "myReplicaSet",
  members: [
    { _id: 0, host: "server1:27017" },
    { _id: 1, host: "server2:27017" },
    { _id: 2, host: "server3:27017" }
  ]
})

JavaScript

Step 3: Verify Configuration

// Check replica set status
rs.status()

// Check configuration
rs.conf()

// Check if current node is primary
rs.isMaster()

// Check replica set status
rs.status()

// Check configuration
rs.conf()

// Check if current node is primary
rs.isMaster()

JavaScript

Configuration File Approach

Create a configuration file for each node:

# mongod.conf
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

storage:
  dbPath: /var/lib/mongo
  journal:
    enabled: true

processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid

net:
  port: 27017
  bindIp: 0.0.0.0

replication:
  replSetName: myReplicaSet

# mongod.conf
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

storage:
  dbPath: /var/lib/mongo
  journal:
    enabled: true

processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid

net:
  port: 27017
  bindIp: 0.0.0.0

replication:
  replSetName: myReplicaSet

YAML

4. Configuration and Management

Replica Set Configuration Document

{
  "_id": "myReplicaSet",
  "version": 1,
  "members": [
    {
      "_id": 0,
      "host": "server1:27017",
      "priority": 2,
      "votes": 1
    },
    {
      "_id": 1,
      "host": "server2:27017",
      "priority": 1,
      "votes": 1
    },
    {
      "_id": 2,
      "host": "server3:27017",
      "priority": 1,
      "votes": 1,
      "hidden": false,
      "slaveDelay": 0
    }
  ]
}

{
  "_id": "myReplicaSet",
  "version": 1,
  "members": [
    {
      "_id": 0,
      "host": "server1:27017",
      "priority": 2,
      "votes": 1
    },
    {
      "_id": 1,
      "host": "server2:27017",
      "priority": 1,
      "votes": 1
    },
    {
      "_id": 2,
      "host": "server3:27017",
      "priority": 1,
      "votes": 1,
      "hidden": false,
      "slaveDelay": 0
    }
  ]
}

JSON

Adding Members

// Add a new member
rs.add("server4:27017")

// Add with specific configuration
rs.add({
  "_id": 3,
  "host": "server4:27017",
  "priority": 0.5,
  "votes": 1
})

// Add a new member
rs.add("server4:27017")

// Add with specific configuration
rs.add({
  "_id": 3,
  "host": "server4:27017",
  "priority": 0.5,
  "votes": 1
})

JavaScript

Removing Members

// Remove a member
rs.remove("server4:27017")

// Remove a member
rs.remove("server4:27017")

JavaScript

Member Configuration Options

Priority

Determines likelihood of becoming primary (0-1000)
Priority 0 = never becomes primary
Higher priority = more likely to be elected

Votes

Determines voting in elections (0 or 1)
Maximum 7 voting members per replica set
Non-voting members can hold data

Hidden Members

Not visible to application
Cannot become primary
Good for backups and reporting

// Configure hidden member
cfg = rs.conf()
cfg.members[2].hidden = true
cfg.members[2].priority = 0
rs.reconfig(cfg)

// Configure hidden member
cfg = rs.conf()
cfg.members[2].hidden = true
cfg.members[2].priority = 0
rs.reconfig(cfg)

JavaScript

Delayed Members

Maintain historical snapshot of data
Useful for protection against human error

// Configure delayed member (1 hour delay)
cfg = rs.conf()
cfg.members[2].slaveDelay = 3600
cfg.members[2].priority = 0
cfg.members[2].hidden = true
rs.reconfig(cfg)

// Configure delayed member (1 hour delay)
cfg = rs.conf()
cfg.members[2].slaveDelay = 3600
cfg.members[2].priority = 0
cfg.members[2].hidden = true
rs.reconfig(cfg)

JavaScript

Topology Examples

Standard Three-Node Setup

graph TB
    subgraph "Standard Replica Set"
        P[PrimaryPriority: 1Votes: 1]
        S1[SecondaryPriority: 1Votes: 1]
        S2[SecondaryPriority: 1Votes: 1]

        P --> S1
        P --> S2
    end

Five-Node with Arbiter

graph TB
    subgraph "Extended Replica Set"
        P[PrimaryPriority: 2Votes: 1]
        S1[SecondaryPriority: 1Votes: 1]
        S2[SecondaryPriority: 1Votes: 1]
        S3[SecondaryPriority: 0Votes: 1Hidden]
        A[ArbiterVotes: 1]

        P --> S1
        P --> S2
        P --> S3
        P -.-> A
    end

5. Read and Write Operations

Write Operations

All write operations go to the primary:

// Write operations always go to primary
db.users.insertOne({name: "John", email: "john@example.com"})
db.users.updateOne({name: "John"}, {$set: {age: 30}})
db.users.deleteOne({name: "John"})

// Write operations always go to primary
db.users.insertOne({name: "John", email: "john@example.com"})
db.users.updateOne({name: "John"}, {$set: {age: 30}})
db.users.deleteOne({name: "John"})

JavaScript

Write Concerns

Control acknowledgment of write operations:

graph LR
    subgraph "Write Concern Levels"
        A[w: 1Primary Only] 
        B[w: 2Primary + 1 Secondary]
        C[w: majorityMajority of Nodes]
        D[w: 0No Acknowledgment]
    end

// Write concern examples
db.users.insertOne(
  {name: "Alice"}, 
  {writeConcern: {w: "majority", wtimeout: 5000}}
)

db.users.insertOne(
  {name: "Bob"}, 
  {writeConcern: {w: 2, j: true, wtimeout: 3000}}
)

// Write concern examples
db.users.insertOne(
  {name: "Alice"}, 
  {writeConcern: {w: "majority", wtimeout: 5000}}
)

db.users.insertOne(
  {name: "Bob"}, 
  {writeConcern: {w: 2, j: true, wtimeout: 3000}}
)

JavaScript

Read Operations and Read Preferences

Read Preference Modes

graph TB
    subgraph "Read Preferences"
        A[primaryDefault - Primary Only]
        B[primaryPreferredPrimary, then Secondary]
        C[secondarySecondary Only]
        D[secondaryPreferredSecondary, then Primary]
        E[nearestLowest Network Latency]
    end

// Set read preference
db.users.find().readPref("secondary")
db.users.find().readPref("primaryPreferred")
db.users.find().readPref("nearest", [{datacenter: "east"}])

// With MongoDB driver
const collection = db.collection('users');
const result = await collection.find({}, {
  readPreference: 'secondaryPreferred'
}).toArray();

// Set read preference
db.users.find().readPref("secondary")
db.users.find().readPref("primaryPreferred")
db.users.find().readPref("nearest", [{datacenter: "east"}])

// With MongoDB driver
const collection = db.collection('users');
const result = await collection.find({}, {
  readPreference: 'secondaryPreferred'
}).toArray();

JavaScript

Read Concern

Control consistency and isolation properties:

// Read concern levels
db.users.find().readConcern("local")      // Default
db.users.find().readConcern("available")  // No guarantee
db.users.find().readConcern("majority")   // Majority committed
db.users.find().readConcern("linearizable") // Linearizable reads

// Read concern levels
db.users.find().readConcern("local")      // Default
db.users.find().readConcern("available")  // No guarantee
db.users.find().readConcern("majority")   // Majority committed
db.users.find().readConcern("linearizable") // Linearizable reads

JavaScript

Connection Strings

// Connection string with read preference
mongodb://server1:27017,server2:27017,server3:27017/mydb?replicaSet=myReplicaSet&readPreference=secondaryPreferred

// With write and read concerns
mongodb://server1:27017,server2:27017,server3:27017/mydb?replicaSet=myReplicaSet&w=majority&readConcernLevel=majority

// Connection string with read preference
mongodb://server1:27017,server2:27017,server3:27017/mydb?replicaSet=myReplicaSet&readPreference=secondaryPreferred

// With write and read concerns
mongodb://server1:27017,server2:27017,server3:27017/mydb?replicaSet=myReplicaSet&w=majority&readConcernLevel=majority

JavaScript

6. Elections and Failover

Election Process

sequenceDiagram
    participant P as Primary
    participant S1 as Secondary1
    participant S2 as Secondary2
    participant S3 as Secondary3

    Note over P,S3: Normal Operation
    P->>S1: Heartbeat
    P->>S2: Heartbeat
    P->>S3: Heartbeat

    Note over P,S3: Primary Fails
    P--xS1: No Heartbeat
    P--xS2: No Heartbeat
    P--xS3: No Heartbeat

    Note over S1,S3: Election Starts
    S1->>S2: Vote Request
    S1->>S3: Vote Request
    S2->>S1: Vote Response
    S3->>S1: Vote Response

    Note over S1,S3: S1 Becomes Primary
    S1->>S2: I am Primary
    S1->>S3: I am Primary

Election Triggers

Primary becomes unreachable
Primary steps down voluntarily
Network partition
Configuration changes

Election Factors

Priority

Higher priority nodes more likely to be elected
Priority 0 nodes cannot become primary

Oplog Position

Nodes with more recent data preferred
Prevents data loss during election

Connectivity

Node must be able to reach majority of voting members

Failover Timeline

gantt
    title Replica Set Failover Timeline
    dateFormat YYYY-MM-DD
    axisFormat %s

    section Detection
    Heartbeat Timeout    :a1, 2023-01-01, 10s

    section Election
    Vote Request         :a2, after a1, 15s
    Vote Collection      :a3, after a2, 10s
    Primary Declaration  :a4, after a3, 5s

    section Recovery
    Catch-up Period      :a5, after a4, 15s
    Normal Operation     :a6, after a5, 15s

Managing Elections

// Force an election (step down primary)
rs.stepDown(60) // Step down for 60 seconds

// Check election metrics
rs.status().members.forEach(function(member) {
  print(member.name + ": " + member.state)
})

// Freeze a node (prevent it from becoming primary)
rs.freeze(120) // Freeze for 120 seconds

// Force an election (step down primary)
rs.stepDown(60) // Step down for 60 seconds

// Check election metrics
rs.status().members.forEach(function(member) {
  print(member.name + ": " + member.state)
})

// Freeze a node (prevent it from becoming primary)
rs.freeze(120) // Freeze for 120 seconds

JavaScript

Split-Brain Prevention

MongoDB prevents split-brain scenarios through majority voting:

graph TB
    subgraph "Network Partition Scenario"
        subgraph "Partition A"
            P[Primary]
            S1[Secondary]
        end

        subgraph "Partition B"
            S2[Secondary]
            S3[Secondary]
            A[Arbiter]
        end

        P -.->|Network Down| S2
    end

    subgraph "Result"
        P2[Primary Steps DownNo Majority]
        S22[New Primary ElectedHas Majority]
    end

7. Monitoring and Maintenance

Basic Monitoring Commands

// Replica set status
rs.status()

// Replication lag
db.printReplicationInfo()
db.printSlaveReplicationInfo()

// Current operations
db.currentOp()

// Server status
db.serverStatus().repl

// Replica set status
rs.status()

// Replication lag
db.printReplicationInfo()
db.printSlaveReplicationInfo()

// Current operations
db.currentOp()

// Server status
db.serverStatus().repl

JavaScript

Key Metrics to Monitor

Replication Lag

// Check replication lag
db.runCommand({replSetGetStatus: 1}).members.forEach(function(member) {
  if (member.state === 2) { // Secondary
    print(member.name + " lag: " + 
          (new Date() - member.optimeDate) / 1000 + " seconds")
  }
})

// Check replication lag
db.runCommand({replSetGetStatus: 1}).members.forEach(function(member) {
  if (member.state === 2) { // Secondary
    print(member.name + " lag: " + 
          (new Date() - member.optimeDate) / 1000 + " seconds")
  }
})

JavaScript

Oplog Size and Utilization

// Check oplog stats
db.oplog.rs.stats()

// Oplog size in GB
db.oplog.rs.stats().maxSize / (1024*1024*1024)

// Time range covered by oplog
db.printReplicationInfo()

// Check oplog stats
db.oplog.rs.stats()

// Oplog size in GB
db.oplog.rs.stats().maxSize / (1024*1024*1024)

// Time range covered by oplog
db.printReplicationInfo()

JavaScript

Monitoring Dashboard Structure

graph TB
    subgraph "MongoDB Monitoring Dashboard"
        A[Replica Set Health]
        B[Replication Lag]
        C[Oplog Utilization]
        D[Election Events]
        E[Connection Pool]
        F[Read/Write Distribution]

        A --> A1[Primary Status]
        A --> A2[Secondary Count]
        A --> A3[Heartbeat Status]

        B --> B1[Max Lag Time]
        B --> B2[Lag by Node]
        B --> B3[Lag Trends]

        C --> C1[Oplog Size]
        C --> C2[Oplog Window]
        C --> C3[Growth Rate]
    end

Setting Up Monitoring

Using MongoDB Ops Manager

// Enable profiling for monitoring
db.setProfilingLevel(1, {slowms: 100})

// Monitor specific operations
db.system.profile.find().limit(5).sort({ts: -1}).pretty()

// Enable profiling for monitoring
db.setProfilingLevel(1, {slowms: 100})

// Monitor specific operations
db.system.profile.find().limit(5).sort({ts: -1}).pretty()

JavaScript

Custom Monitoring Script

// monitoring.js
function checkReplicaSetHealth() {
  const status = rs.status()
  const health = {
    setName: status.set,
    primary: null,
    secondaries: [],
    arbiters: [],
    maxLag: 0
  }

  status.members.forEach(function(member) {
    if (member.state === 1) {
      health.primary = member.name
    } else if (member.state === 2) {
      health.secondaries.push({
        name: member.name,
        lag: (new Date() - member.optimeDate) / 1000
      })
      health.maxLag = Math.max(health.maxLag, 
                               (new Date() - member.optimeDate) / 1000)
    } else if (member.state === 7) {
      health.arbiters.push(member.name)
    }
  })

  return health
}

// Run monitoring
const health = checkReplicaSetHealth()
print(JSON.stringify(health, null, 2))

// monitoring.js
function checkReplicaSetHealth() {
  const status = rs.status()
  const health = {
    setName: status.set,
    primary: null,
    secondaries: [],
    arbiters: [],
    maxLag: 0
  }

  status.members.forEach(function(member) {
    if (member.state === 1) {
      health.primary = member.name
    } else if (member.state === 2) {
      health.secondaries.push({
        name: member.name,
        lag: (new Date() - member.optimeDate) / 1000
      })
      health.maxLag = Math.max(health.maxLag, 
                               (new Date() - member.optimeDate) / 1000)
    } else if (member.state === 7) {
      health.arbiters.push(member.name)
    }
  })

  return health
}

// Run monitoring
const health = checkReplicaSetHealth()
print(JSON.stringify(health, null, 2))

JavaScript

Log Analysis

Important Log Patterns

# Election events
grep "election" /var/log/mongodb/mongod.log

# Replication lag warnings
grep "replication lag" /var/log/mongodb/mongod.log

# Connection issues
grep "connection" /var/log/mongodb/mongod.log

# Oplog issues
grep "oplog" /var/log/mongodb/mongod.log

# Election events
grep "election" /var/log/mongodb/mongod.log

# Replication lag warnings
grep "replication lag" /var/log/mongodb/mongod.log

# Connection issues
grep "connection" /var/log/mongodb/mongod.log

# Oplog issues
grep "oplog" /var/log/mongodb/mongod.log

Bash

8. Advanced Topics

Chained Replication

graph TB
    subgraph "Chained Replication"
        P[PrimaryData Center A]
        S1[Secondary 1Data Center A]
        S2[Secondary 2Data Center B]
        S3[Secondary 3Data Center C]

        P --> S1
        S1 --> S2
        S2 --> S3

        style P fill:#e1f5fe
        style S1 fill:#f3e5f5
        style S2 fill:#f3e5f5
        style S3 fill:#f3e5f5
    end

Configuring Chained Replication

// Allow chaining (default: true)
cfg = rs.conf()
cfg.settings = cfg.settings || {}
cfg.settings.chainingAllowed = true
rs.reconfig(cfg)

// Allow chaining (default: true)
cfg = rs.conf()
cfg.settings = cfg.settings || {}
cfg.settings.chainingAllowed = true
rs.reconfig(cfg)

JavaScript

Multi-Data Center Deployment

graph TB
    subgraph "Multi-DC Replica Set"
        subgraph "DC1 - Primary"
            P[PrimaryPriority: 2]
            S1[SecondaryPriority: 1]
        end

        subgraph "DC2 - Secondary"
            S2[SecondaryPriority: 1]
            S3[SecondaryPriority: 1]
        end

        subgraph "DC3 - Arbiter"
            A[ArbiterTie Breaker]
        end

        P --> S1
        P -.-> S2
        P -.-> S3
        P -.-> A
    end

Configuration for Multi-DC

rs.initiate({
  _id: "multiDCSet",
  members: [
    { _id: 0, host: "dc1-server1:27017", priority: 2 },
    { _id: 1, host: "dc1-server2:27017", priority: 1 },
    { _id: 2, host: "dc2-server1:27017", priority: 1 },
    { _id: 3, host: "dc2-server2:27017", priority: 1 },
    { _id: 4, host: "dc3-arbiter:27017", arbiterOnly: true }
  ]
})

rs.initiate({
  _id: "multiDCSet",
  members: [
    { _id: 0, host: "dc1-server1:27017", priority: 2 },
    { _id: 1, host: "dc1-server2:27017", priority: 1 },
    { _id: 2, host: "dc2-server1:27017", priority: 1 },
    { _id: 3, host: "dc2-server2:27017", priority: 1 },
    { _id: 4, host: "dc3-arbiter:27017", arbiterOnly: true }
  ]
})

JavaScript

Tag-Based Read Preferences

// Configure tags
cfg = rs.conf()
cfg.members[0].tags = {datacenter: "east", rack: "1"}
cfg.members[1].tags = {datacenter: "east", rack: "2"}
cfg.members[2].tags = {datacenter: "west", rack: "1"}
rs.reconfig(cfg)

// Use tagged read preference
db.users.find().readPref("nearest", [{datacenter: "east"}])
db.users.find().readPref("secondary", [{rack: "1"}])

// Configure tags
cfg = rs.conf()
cfg.members[0].tags = {datacenter: "east", rack: "1"}
cfg.members[1].tags = {datacenter: "east", rack: "2"}
cfg.members[2].tags = {datacenter: "west", rack: "1"}
rs.reconfig(cfg)

// Use tagged read preference
db.users.find().readPref("nearest", [{datacenter: "east"}])
db.users.find().readPref("secondary", [{rack: "1"}])

JavaScript

Write Concern with Tags

// Configure tag-based write concern
cfg = rs.conf()
cfg.settings = {
  getLastErrorModes: {
    multiDC: {datacenter: 2},
    allRacks: {rack: 3}
  }
}
rs.reconfig(cfg)

// Use tagged write concern
db.users.insertOne(
  {name: "Critical Data"}, 
  {writeConcern: {w: "multiDC", wtimeout: 5000}}
)

// Configure tag-based write concern
cfg = rs.conf()
cfg.settings = {
  getLastErrorModes: {
    multiDC: {datacenter: 2},
    allRacks: {rack: 3}
  }
}
rs.reconfig(cfg)

// Use tagged write concern
db.users.insertOne(
  {name: "Critical Data"}, 
  {writeConcern: {w: "multiDC", wtimeout: 5000}}
)

JavaScript

Replica Set Maintenance

Rolling Maintenance

sequenceDiagram
    participant P as Primary
    participant S1 as Secondary1
    participant S2 as Secondary2

    Note over P,S2: Step 1: Maintain Secondary1
    P->>S1: Shutdown for maintenance
    S1-->>P: Offline
    P->>S2: Continue replication

    Note over P,S2: Step 2: S1 Back Online
    S1->>P: Reconnect and sync
    P->>S1: Catch up replication

    Note over P,S2: Step 3: Maintain Secondary2
    P->>S2: Shutdown for maintenance
    S2-->>P: Offline
    P->>S1: Continue replication

    Note over P,S2: Step 4: Step Down Primary
    P->>S1: Step down
    S1->>P: Become Primary
    P->>P: Maintenance mode

Maintenance Procedures

// 1. Perform maintenance on secondaries first
rs.status() // Identify secondaries

// 2. For each secondary:
// - Stop MongoDB process
// - Perform maintenance (OS updates, hardware, etc.)
// - Restart MongoDB
// - Wait for catch-up

// 3. Step down primary
rs.stepDown(60)

// 4. Perform maintenance on former primary
// 5. Restart and let it rejoin as secondary

// 1. Perform maintenance on secondaries first
rs.status() // Identify secondaries

// 2. For each secondary:
// - Stop MongoDB process
// - Perform maintenance (OS updates, hardware, etc.)
// - Restart MongoDB
// - Wait for catch-up

// 3. Step down primary
rs.stepDown(60)

// 4. Perform maintenance on former primary
// 5. Restart and let it rejoin as secondary

JavaScript

Backup Strategies

Backup from Secondary

# Create backup from secondary to avoid impacting primary
mongodump --host secondary1:27017 --oplog --out /backup/mongodb/$(date +%Y%m%d)

# Point-in-time backup
mongodump --host secondary1:27017 --oplog --query '{"timestamp": {"$lt": {"$timestamp": {"t": 1609459200, "i": 1}}}}'

# Create backup from secondary to avoid impacting primary
mongodump --host secondary1:27017 --oplog --out /backup/mongodb/$(date +%Y%m%d)

# Point-in-time backup
mongodump --host secondary1:27017 --oplog --query '{"timestamp": {"$lt": {"$timestamp": {"t": 1609459200, "i": 1}}}}'

Bash

Delayed Member for Backup

// Configure delayed member for backup protection
cfg = rs.conf()
cfg.members[3] = {
  _id: 3,
  host: "backup-server:27017",
  priority: 0,
  hidden: true,
  slaveDelay: 7200, // 2 hours delay
  votes: 0
}
rs.reconfig(cfg)

// Configure delayed member for backup protection
cfg = rs.conf()
cfg.members[3] = {
  _id: 3,
  host: "backup-server:27017",
  priority: 0,
  hidden: true,
  slaveDelay: 7200, // 2 hours delay
  votes: 0
}
rs.reconfig(cfg)

JavaScript

9. Troubleshooting

Common Issues and Solutions

1. Replication Lag

Symptoms:

High lag reported in rs.status()
Delayed data on secondaries

Diagnosis:

// Check replication lag
db.printSlaveReplicationInfo()

// Check oplog window
db.printReplicationInfo()

// Monitor oplog growth
db.oplog.rs.find().sort({$natural: -1}).limit(1)

// Check replication lag
db.printSlaveReplicationInfo()

// Check oplog window
db.printReplicationInfo()

// Monitor oplog growth
db.oplog.rs.find().sort({$natural: -1}).limit(1)

JavaScript

Solutions:

graph TB
    A[Replication Lag Detected] --> B{Check Network}
    B -->|Network OK| C{Check Secondary Load}
    B -->|Network Issues| D[Fix Network Connectivity]
    C -->|High Load| E[Scale Secondary Resources]
    C -->|Load OK| F{Check Oplog Size}
    F -->|Too Small| G[Increase Oplog Size]
    F -->|Size OK| H[Check Write Patterns]

2. Election Issues

Symptoms:

Frequent elections
No primary elected

Diagnosis:

// Check election stats
rs.status().members.forEach(function(m) {
  if (m.electionTime) {
    print(m.name + " last election: " + m.electionTime)
  }
})

// Check voting configuration
rs.conf().members.forEach(function(m) {
  print(m.host + " votes: " + m.votes + " priority: " + m.priority)
})

// Check election stats
rs.status().members.forEach(function(m) {
  if (m.electionTime) {
    print(m.name + " last election: " + m.electionTime)
  }
})

// Check voting configuration
rs.conf().members.forEach(function(m) {
  print(m.host + " votes: " + m.votes + " priority: " + m.priority)
})

JavaScript

Common Solutions:

// Fix: Network partition
// Ensure majority of nodes can communicate

// Fix: Clock skew
// Synchronize clocks across all nodes

// Fix: Priority misconfiguration
cfg = rs.conf()
cfg.members[0].priority = 2  // Give preference to specific node
rs.reconfig(cfg)

// Fix: Network partition
// Ensure majority of nodes can communicate

// Fix: Clock skew
// Synchronize clocks across all nodes

// Fix: Priority misconfiguration
cfg = rs.conf()
cfg.members[0].priority = 2  // Give preference to specific node
rs.reconfig(cfg)

JavaScript

3. Oplog Issues

Problem: Oplog Too Small

// Check current oplog size
db.oplog.rs.stats().maxSize / (1024*1024*1024) // Size in GB

// Resize oplog (MongoDB 3.6+)
db.adminCommand({replSetResizeOplog: 1, size: 10240}) // 10GB

// Check current oplog size
db.oplog.rs.stats().maxSize / (1024*1024*1024) // Size in GB

// Resize oplog (MongoDB 3.6+)
db.adminCommand({replSetResizeOplog: 1, size: 10240}) // 10GB

JavaScript

Problem: Oplog Overflow

// Monitor oplog utilization
function checkOplogUtilization() {
  const stats = db.oplog.rs.stats()
  const oldest = db.oplog.rs.find().sort({$natural: 1}).limit(1).next()
  const newest = db.oplog.rs.find().sort({$natural: -1}).limit(1).next()

  const window = newest.ts.getTime() - oldest.ts.getTime()
  const hours = window / (1000 * 60 * 60)

  print("Oplog window: " + hours + " hours")
  print("Oplog size: " + (stats.maxSize / 1024 / 1024 / 1024) + " GB")
}

// Monitor oplog utilization
function checkOplogUtilization() {
  const stats = db.oplog.rs.stats()
  const oldest = db.oplog.rs.find().sort({$natural: 1}).limit(1).next()
  const newest = db.oplog.rs.find().sort({$natural: -1}).limit(1).next()

  const window = newest.ts.getTime() - oldest.ts.getTime()
  const hours = window / (1000 * 60 * 60)

  print("Oplog window: " + hours + " hours")
  print("Oplog size: " + (stats.maxSize / 1024 / 1024 / 1024) + " GB")
}

JavaScript

4. Connection Issues

Diagnosis:

// Check current connections
db.serverStatus().connections

// Monitor connection pool
db.runCommand({connPoolStats: 1})

// Check for connection errors in logs

// Check current connections
db.serverStatus().connections

// Monitor connection pool
db.runCommand({connPoolStats: 1})

// Check for connection errors in logs

JavaScript

Solutions:

# Increase connection limits in mongod.conf
net:
  maxIncomingConnections: 20000

# Connection string optimization
mongodb://server1:27017,server2:27017,server3:27017/mydb?
  replicaSet=myReplicaSet&
  maxPoolSize=100&
  minPoolSize=10&
  maxIdleTimeMS=30000

# Increase connection limits in mongod.conf
net:
  maxIncomingConnections: 20000

# Connection string optimization
mongodb://server1:27017,server2:27017,server3:27017/mydb?
  replicaSet=myReplicaSet&
  maxPoolSize=100&
  minPoolSize=10&
  maxIdleTimeMS=30000

INI

Diagnostic Commands Reference

// Comprehensive health check
function healthCheck() {
  print("=== Replica Set Health Check ===")

  // Basic status
  print("\n1. Replica Set Status:")
  const status = rs.status()
  print("Set: " + status.set)
  print("Primary: " + status.members.find(m => m.state === 1)?.name || "None")

  // Member states
  print("\n2. Member States:")
  status.members.forEach(m => {
    print(m.name + ": " + m.stateStr + " (lag: " + 
          ((new Date() - m.optimeDate) / 1000) + "s)")
  })

  // Oplog info
  print("\n3. Oplog Information:")
  db.printReplicationInfo()

  // Connection status
  print("\n4. Connections:")
  const connStats = db.serverStatus().connections
  print("Current: " + connStats.current + "/" + connStats.available)

  return status
}

// Run health check
healthCheck()

// Comprehensive health check
function healthCheck() {
  print("=== Replica Set Health Check ===")

  // Basic status
  print("\n1. Replica Set Status:")
  const status = rs.status()
  print("Set: " + status.set)
  print("Primary: " + status.members.find(m => m.state === 1)?.name || "None")

  // Member states
  print("\n2. Member States:")
  status.members.forEach(m => {
    print(m.name + ": " + m.stateStr + " (lag: " + 
          ((new Date() - m.optimeDate) / 1000) + "s)")
  })

  // Oplog info
  print("\n3. Oplog Information:")
  db.printReplicationInfo()

  // Connection status
  print("\n4. Connections:")
  const connStats = db.serverStatus().connections
  print("Current: " + connStats.current + "/" + connStats.available)

  return status
}

// Run health check
healthCheck()

JavaScript

Recovery Procedures

Recovering from Data Corruption

graph TB
    A[Detect Corruption] --> B[Isolate Affected Node]
    B --> C[Stop MongoDB Process]
    C --> D{Data Recoverable?}
    D -->|Yes| E[Repair Database]
    D -->|No| F[Remove from Replica Set]
    E --> G[Restart and Resync]
    F --> H[Fresh Install]
    G --> I[Monitor Health]
    H --> I

// Remove corrupted member
rs.remove("corrupted-server:27017")

// After fixing, re-add
rs.add("fixed-server:27017")

// Remove corrupted member
rs.remove("corrupted-server:27017")

// After fixing, re-add
rs.add("fixed-server:27017")

JavaScript

Initial Sync Issues

// Force resync of a member
// 1. Stop MongoDB on the problematic secondary
// 2. Remove data directory
// 3. Restart MongoDB - it will perform initial sync

// Monitor initial sync progress
db.serverStatus().initialSync

// Force resync of a member
// 1. Stop MongoDB on the problematic secondary
// 2. Remove data directory
// 3. Restart MongoDB - it will perform initial sync

// Monitor initial sync progress
db.serverStatus().initialSync

JavaScript

10. Best Practices

Deployment Best Practices

Hardware Recommendations

graph TB
    subgraph "Production Deployment"
        subgraph "Primary Node"
            P1[High-performance SSD]
            P2[Adequate RAMWorking Set + OS]
            P3[Fast Network]
            P4[Redundant Power]
        end

        subgraph "Secondary Nodes"
            S1[Similar Hardware to Primary]
            S2[Geographically Distributed]
            S3[Dedicated Networks]
        end

        subgraph "Monitoring"
            M1[Centralized Logging]
            M2[Metrics Collection]
            M3[Alerting System]
        end
    end

Network Configuration

# Firewall rules (example for iptables)
# Allow MongoDB port between replica set members
iptables -A INPUT -p tcp -s 10.0.1.0/24 --dport 27017 -j ACCEPT

# Security group (AWS example)
# Source: Security group ID of replica set members
# Port: 27017
# Protocol: TCP

# Firewall rules (example for iptables)
# Allow MongoDB port between replica set members
iptables -A INPUT -p tcp -s 10.0.1.0/24 --dport 27017 -j ACCEPT

# Security group (AWS example)
# Source: Security group ID of replica set members
# Port: 27017
# Protocol: TCP

Bash

Operating System Tuning

# Disable Transparent Huge Pages
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag

# Set appropriate ulimits
echo "mongodb soft nofile 64000" >> /etc/security/limits.conf
echo "mongodb hard nofile 64000" >> /etc/security/limits.conf
echo "mongodb soft nproc 32000" >> /etc/security/limits.conf
echo "mongodb hard nproc 32000" >> /etc/security/limits.conf

# Configure swappiness
echo 'vm.swappiness = 1' >> /etc/sysctl.conf

# Disable Transparent Huge Pages
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag

# Set appropriate ulimits
echo "mongodb soft nofile 64000" >> /etc/security/limits.conf
echo "mongodb hard nofile 64000" >> /etc/security/limits.conf
echo "mongodb soft nproc 32000" >> /etc/security/limits.conf
echo "mongodb hard nproc 32000" >> /etc/security/limits.conf

# Configure swappiness
echo 'vm.swappiness = 1' >> /etc/sysctl.conf

Bash

Security Best Practices

Authentication and Authorization

// Create admin user
use admin
db.createUser({
  user: "admin",
  pwd: "securePassword",
  roles: ["userAdminAnyDatabase", "dbAdminAnyDatabase", "readWriteAnyDatabase"]
})

// Create replica set user
db.createUser({
  user: "replicaSetUser",
  pwd: "replicaPassword",
  roles: ["clusterAdmin"]
})

// Create admin user
use admin
db.createUser({
  user: "admin",
  pwd: "securePassword",
  roles: ["userAdminAnyDatabase", "dbAdminAnyDatabase", "readWriteAnyDatabase"]
})

// Create replica set user
db.createUser({
  user: "replicaSetUser",
  pwd: "replicaPassword",
  roles: ["clusterAdmin"]
})

JavaScript

Enable Authentication

# mongod.conf
security:
  authorization: enabled
  keyFile: /etc/mongodb-keyfile

# Create keyfile
openssl rand -base64 756 > /etc/mongodb-keyfile
chmod 400 /etc/mongodb-keyfile
chown mongodb:mongodb /etc/mongodb-keyfile

# mongod.conf
security:
  authorization: enabled
  keyFile: /etc/mongodb-keyfile

# Create keyfile
openssl rand -base64 756 > /etc/mongodb-keyfile
chmod 400 /etc/mongodb-keyfile
chown mongodb:mongodb /etc/mongodb-keyfile

Bash

SSL/TLS Configuration

# mongod.conf
net:
  ssl:
    mode: requireSSL
    PEMKeyFile: /etc/ssl/mongodb.pem
    CAFile: /etc/ssl/ca.pem
    allowConnectionsWithoutCertificates: false

# mongod.conf
net:
  ssl:
    mode: requireSSL
    PEMKeyFile: /etc/ssl/mongodb.pem
    CAFile: /etc/ssl/ca.pem
    allowConnectionsWithoutCertificates: false

INI

Performance Best Practices

Write Concern Strategy

graph LR
    subgraph "Write Concern Selection"
        A[Application Type] --> B{Consistency Needs}
        B -->|High| C[w: majority]
        B -->|Medium| D[w: 2]
        B -->|Low| E[w: 1]

        F[Performance Needs] --> G{Latency Tolerance}
        G -->|Low| H[w: 1, j: false]
        G -->|Medium| I[w: majority, j: true]
        G -->|High| J[w: all, j: true]
    end

Read Preference Strategy

// Application patterns
const strategies = {
  // Real-time dashboard - need latest data
  realTime: { readPreference: 'primary' },

  // Analytics - can tolerate slight lag
  analytics: { readPreference: 'secondaryPreferred' },

  // Reports - distribute load
  reports: { readPreference: 'secondary' },

  // Global app - use nearest
  global: { readPreference: 'nearest' }
}

// Application patterns
const strategies = {
  // Real-time dashboard - need latest data
  realTime: { readPreference: 'primary' },

  // Analytics - can tolerate slight lag
  analytics: { readPreference: 'secondaryPreferred' },

  // Reports - distribute load
  reports: { readPreference: 'secondary' },

  // Global app - use nearest
  global: { readPreference: 'nearest' }
}

JavaScript

Index Strategy for Replica Sets

// Create indexes on primary - automatically replicated
db.users.createIndex({email: 1}, {unique: true})
db.orders.createIndex({customerId: 1, orderDate: -1})

// Background index creation (less blocking)
db.products.createIndex({category: 1, price: -1}, {background: true})

// Partial indexes for efficiency
db.users.createIndex(
  {email: 1}, 
  {partialFilterExpression: {email: {$exists: true}}}
)

// Create indexes on primary - automatically replicated
db.users.createIndex({email: 1}, {unique: true})
db.orders.createIndex({customerId: 1, orderDate: -1})

// Background index creation (less blocking)
db.products.createIndex({category: 1, price: -1}, {background: true})

// Partial indexes for efficiency
db.users.createIndex(
  {email: 1}, 
  {partialFilterExpression: {email: {$exists: true}}}
)

JavaScript

Monitoring and Alerting

Key Metrics to Monitor

// Monitoring script template
const monitoringChecks = {
  replicationLag: function() {
    const status = rs.status()
    const primary = status.members.find(m => m.state === 1)
    const maxLag = Math.max(...status.members
      .filter(m => m.state === 2)
      .map(m => (new Date() - m.optimeDate) / 1000))

    return {
      metric: 'replication_lag_seconds',
      value: maxLag,
      threshold: 30, // Alert if > 30 seconds
      status: maxLag > 30 ? 'CRITICAL' : 'OK'
    }
  },

  oplogWindow: function() {
    const stats = db.oplog.rs.stats()
    const oldest = db.oplog.rs.find().sort({$natural: 1}).limit(1).next()
    const newest = db.oplog.rs.find().sort({$natural: -1}).limit(1).next()
    const hours = (newest.ts.getTime() - oldest.ts.getTime()) / (1000 * 60 * 60)

    return {
      metric: 'oplog_window_hours',
      value: hours,
      threshold: 24, // Alert if < 24 hours
      status: hours < 24 ? 'WARNING' : 'OK'
    }
  },

  primaryStatus: function() {
    const status = rs.status()
    const hasPrimary = status.members.some(m => m.state === 1)

    return {
      metric: 'has_primary',
      value: hasPrimary ? 1 : 0,
      threshold: 1,
      status: hasPrimary ? 'OK' : 'CRITICAL'
    }
  }
}

// Monitoring script template
const monitoringChecks = {
  replicationLag: function() {
    const status = rs.status()
    const primary = status.members.find(m => m.state === 1)
    const maxLag = Math.max(...status.members
      .filter(m => m.state === 2)
      .map(m => (new Date() - m.optimeDate) / 1000))

    return {
      metric: 'replication_lag_seconds',
      value: maxLag,
      threshold: 30, // Alert if > 30 seconds
      status: maxLag > 30 ? 'CRITICAL' : 'OK'
    }
  },

  oplogWindow: function() {
    const stats = db.oplog.rs.stats()
    const oldest = db.oplog.rs.find().sort({$natural: 1}).limit(1).next()
    const newest = db.oplog.rs.find().sort({$natural: -1}).limit(1).next()
    const hours = (newest.ts.getTime() - oldest.ts.getTime()) / (1000 * 60 * 60)

    return {
      metric: 'oplog_window_hours',
      value: hours,
      threshold: 24, // Alert if < 24 hours
      status: hours < 24 ? 'WARNING' : 'OK'
    }
  },

  primaryStatus: function() {
    const status = rs.status()
    const hasPrimary = status.members.some(m => m.state === 1)

    return {
      metric: 'has_primary',
      value: hasPrimary ? 1 : 0,
      threshold: 1,
      status: hasPrimary ? 'OK' : 'CRITICAL'
    }
  }
}

JavaScript

Alerting Rules

# Example Prometheus alerting rules
groups:
  - name: mongodb.rules
    rules:
      - alert: MongoDBReplicationLag
        expr: mongodb_replication_lag_seconds > 30
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "MongoDB replication lag is high"

      - alert: MongoDBNoPrimary
        expr: mongodb_replica_set_primary_count == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "MongoDB replica set has no primary"

      - alert: MongoDBOplogWindow
        expr: mongodb_oplog_window_hours < 24
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "MongoDB oplog window is getting small"

# Example Prometheus alerting rules
groups:
  - name: mongodb.rules
    rules:
      - alert: MongoDBReplicationLag
        expr: mongodb_replication_lag_seconds > 30
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "MongoDB replication lag is high"

      - alert: MongoDBNoPrimary
        expr: mongodb_replica_set_primary_count == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "MongoDB replica set has no primary"

      - alert: MongoDBOplogWindow
        expr: mongodb_oplog_window_hours < 24
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "MongoDB oplog window is getting small"

YAML

Capacity Planning

Growth Estimation

// Capacity planning script
function capacityPlanning() {
  const stats = db.stats()
  const collections = db.runCommand("listCollections").cursor.firstBatch

  const analysis = {
    currentSize: stats.dataSize / (1024*1024*1024), // GB
    indexSize: stats.indexSize / (1024*1024*1024),  // GB
    avgDocSize: stats.avgObjSize,
    collections: collections.length
  }

  // Project growth (example: 20% monthly)
  const monthlyGrowth = 1.20
  const months = 12

  analysis.projectedSize = analysis.currentSize * Math.pow(monthlyGrowth, months)
  analysis.recommendedStorage = analysis.projectedSize * 2 // 100% buffer

  return analysis
}

// Capacity planning script
function capacityPlanning() {
  const stats = db.stats()
  const collections = db.runCommand("listCollections").cursor.firstBatch

  const analysis = {
    currentSize: stats.dataSize / (1024*1024*1024), // GB
    indexSize: stats.indexSize / (1024*1024*1024),  // GB
    avgDocSize: stats.avgObjSize,
    collections: collections.length
  }

  // Project growth (example: 20% monthly)
  const monthlyGrowth = 1.20
  const months = 12

  analysis.projectedSize = analysis.currentSize * Math.pow(monthlyGrowth, months)
  analysis.recommendedStorage = analysis.projectedSize * 2 // 100% buffer

  return analysis
}

JavaScript

Resource Scaling Guidelines

graph TB
    subgraph "Scaling Decision Tree"
        A[Performance Issues?] --> B{CPU Bound?}
        A --> C{Memory Bound?}
        A --> D{Disk I/O Bound?}
        A --> E{Network Bound?}

        B -->|Yes| F[Scale CPU Verticallyor Add Read Replicas]
        C -->|Yes| G[Add RAM orOptimize Queries]
        D -->|Yes| H[Upgrade to SSD orAdd More Secondaries]
        E -->|Yes| I[Upgrade Network orOptimize Connection Pool]
    end

Disaster Recovery

Backup Strategy

graph TB
    subgraph "Backup Strategy"
        A[Daily Full Backup] --> B[Continuous Oplog Backup]
        B --> C[Point-in-Time Recovery]

        D[Geographic Distribution] --> E[Cross-Region Replication]
        E --> F[Disaster Recovery Site]

        G[Testing] --> H[Monthly Restore Tests]
        H --> I[Documented Procedures]
    end

Recovery Procedures

// Disaster recovery runbook
const recoveryProcedures = {
  totalLoss: [
    "1. Restore from latest backup",
    "2. Replay oplog entries",
    "3. Validate data integrity",
    "4. Rebuild replica set",
    "5. Update application connection strings"
  ],

  primaryLoss: [
    "1. Verify secondary promotion",
    "2. Update application if needed",
    "3. Rebuild failed primary",
    "4. Add back to replica set"
  ],

  majorityLoss: [
    "1. Restore from backup to new servers",
    "2. Reconfigure replica set",
    "3. Force reconfiguration if needed",
    "4. Validate application connectivity"
  ]
}

// Disaster recovery runbook
const recoveryProcedures = {
  totalLoss: [
    "1. Restore from latest backup",
    "2. Replay oplog entries",
    "3. Validate data integrity",
    "4. Rebuild replica set",
    "5. Update application connection strings"
  ],

  primaryLoss: [
    "1. Verify secondary promotion",
    "2. Update application if needed",
    "3. Rebuild failed primary",
    "4. Add back to replica set"
  ],

  majorityLoss: [
    "1. Restore from backup to new servers",
    "2. Reconfigure replica set",
    "3. Force reconfiguration if needed",
    "4. Validate application connectivity"
  ]
}

JavaScript

Conclusion

MongoDB replication provides robust high availability and data protection through replica sets. Key takeaways:

Always deploy in odd numbers (3, 5, 7 members) to ensure clear majorities
Monitor replication lag and oplog window continuously
Use appropriate write and read concerns for your consistency needs
Plan for failure scenarios and practice recovery procedures
Implement comprehensive monitoring and alerting
Follow security best practices including authentication and encryption
Regular maintenance and capacity planning are essential

By following these practices and understanding the concepts in this guide, you’ll be able to successfully deploy and manage MongoDB replica sets from development through enterprise production environments.

Additional Resources

This guide covers MongoDB replication comprehensively. For the latest features and updates, always refer to the official MongoDB documentation.

Discover more from Altgr Blog

Subscribe to get the latest posts sent to your email.

Table of Contents

1. Introduction to MongoDB Replication

What is MongoDB Replication?

Why Use Replication?

Replication vs Sharding

2. Understanding Replica Sets

What is a Replica Set?

Replica Set Architecture

Node Types

Primary Node

Secondary Nodes

Arbiter Nodes

Oplog (Operations Log)

3. Setting Up Your First Replica Set

Prerequisites

Step-by-Step Setup

Step 1: Start MongoDB Instances

Step 2: Initialize Replica Set

Step 3: Verify Configuration

Configuration File Approach

4. Configuration and Management

Replica Set Configuration Document

Adding Members

Removing Members

Member Configuration Options

Priority

Votes

Hidden Members

Delayed Members

Topology Examples

Standard Three-Node Setup

Five-Node with Arbiter

5. Read and Write Operations

Write Operations

Write Concerns

Read Operations and Read Preferences

Read Preference Modes

Read Concern

Connection Strings

6. Elections and Failover

Election Process

Election Triggers

Election Factors

Priority

Oplog Position

Connectivity

Failover Timeline

Managing Elections

Split-Brain Prevention

7. Monitoring and Maintenance

Basic Monitoring Commands

Key Metrics to Monitor

Replication Lag

Oplog Size and Utilization

Monitoring Dashboard Structure

Setting Up Monitoring

Using MongoDB Ops Manager

Custom Monitoring Script

Log Analysis

Important Log Patterns

8. Advanced Topics

Chained Replication

Configuring Chained Replication

Multi-Data Center Deployment

Configuration for Multi-DC

Tag-Based Read Preferences

Write Concern with Tags

Replica Set Maintenance

Rolling Maintenance

Maintenance Procedures

Backup Strategies

Backup from Secondary

Delayed Member for Backup

9. Troubleshooting

Common Issues and Solutions

1. Replication Lag

2. Election Issues

3. Oplog Issues

4. Connection Issues

Diagnostic Commands Reference