From Beginner to Expert
Table of Contents
- Introduction to MongoDB Replication
- Understanding Replica Sets
- Setting Up Your First Replica Set
- Configuration and Management
- Read and Write Operations
- Elections and Failover
- Monitoring and Maintenance
- Advanced Topics
- Troubleshooting
- Best Practices
1. Introduction to MongoDB Replication
What is MongoDB Replication?
MongoDB replication is the process of synchronizing data across multiple servers. It provides redundancy and increases data availability, and with multiple copies of data on different database servers, replication protects a database from the loss of a single server.
Why Use Replication?
- High Availability: Automatic failover when primary goes down
- Data Redundancy: Multiple copies of your data
- Read Scaling: Distribute read operations across secondaries
- Data Recovery: Protection against data loss
- Geographic Distribution: Deploy across multiple data centers
Replication vs Sharding
graph TB
subgraph "Replication"
A[Application] --> B[Primary]
B --> C[Secondary 1]
B --> D[Secondary 2]
B --> E[Secondary 3]
end
subgraph "Sharding"
F[Application] --> G[mongos Router]
G --> H[Shard 1]
G --> I[Shard 2]
G --> J[Shard 3]
end2. Understanding Replica Sets
What is a Replica Set?
A replica set is a group of MongoDB processes that maintain the same data set. Replica sets provide redundancy and high availability.
Replica Set Architecture
graph TB
subgraph "Replica Set"
P[Primary NodeAccepts Writes]
S1[Secondary NodeReplicates Data]
S2[Secondary NodeReplicates Data]
A[ArbiterVoting Only]
P -->|Oplog| S1
P -->|Oplog| S2
P -.->|Heartbeat| S1
P -.->|Heartbeat| S2
P -.->|Heartbeat| A
S1 -.->|Heartbeat| S2
S1 -.->|Heartbeat| A
S2 -.->|Heartbeat| A
end
C[Client] --> P
C -.->|Read Preference| S1
C -.->|Read Preference| S2Node Types
Primary Node
- Receives all write operations
- Records changes in oplog (operations log)
- Only one primary per replica set
Secondary Nodes
- Maintain copies of primary’s data
- Apply operations from oplog
- Can serve read operations (with read preference)
- Can become primary during elections
Arbiter Nodes
- Participate in elections only
- Do not hold data
- Lightweight option for odd number of voting members
Oplog (Operations Log)
sequenceDiagram
participant Client
participant Primary
participant Secondary1
participant Secondary2
Client->>Primary: Insert Document
Primary->>Primary: Write to Collection
Primary->>Primary: Write to Oplog
Primary-->>Secondary1: Replicate Oplog Entry
Primary-->>Secondary2: Replicate Oplog Entry
Secondary1->>Secondary1: Apply Operation
Secondary2->>Secondary2: Apply Operation3. Setting Up Your First Replica Set
Prerequisites
- MongoDB installed on multiple servers
- Network connectivity between servers
- Proper firewall configuration (port 27017)
Step-by-Step Setup
Step 1: Start MongoDB Instances
On each server, start MongoDB with replica set configuration:
# Server 1 (Primary)
mongod --replSet myReplicaSet --port 27017 --dbpath /data/db1
# Server 2 (Secondary)
mongod --replSet myReplicaSet --port 27017 --dbpath /data/db2
# Server 3 (Secondary)
mongod --replSet myReplicaSet --port 27017 --dbpath /data/db3BashStep 2: Initialize Replica Set
Connect to one of the MongoDB instances:
// Connect to MongoDB
mongo
// Initialize replica set
rs.initiate({
_id: "myReplicaSet",
members: [
{ _id: 0, host: "server1:27017" },
{ _id: 1, host: "server2:27017" },
{ _id: 2, host: "server3:27017" }
]
})JavaScriptStep 3: Verify Configuration
// Check replica set status
rs.status()
// Check configuration
rs.conf()
// Check if current node is primary
rs.isMaster()JavaScriptConfiguration File Approach
Create a configuration file for each node:
# mongod.conf
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
storage:
dbPath: /var/lib/mongo
journal:
enabled: true
processManagement:
fork: true
pidFilePath: /var/run/mongodb/mongod.pid
net:
port: 27017
bindIp: 0.0.0.0
replication:
replSetName: myReplicaSetYAML4. Configuration and Management
Replica Set Configuration Document
{
"_id": "myReplicaSet",
"version": 1,
"members": [
{
"_id": 0,
"host": "server1:27017",
"priority": 2,
"votes": 1
},
{
"_id": 1,
"host": "server2:27017",
"priority": 1,
"votes": 1
},
{
"_id": 2,
"host": "server3:27017",
"priority": 1,
"votes": 1,
"hidden": false,
"slaveDelay": 0
}
]
}JSONAdding Members
// Add a new member
rs.add("server4:27017")
// Add with specific configuration
rs.add({
"_id": 3,
"host": "server4:27017",
"priority": 0.5,
"votes": 1
})JavaScriptRemoving Members
// Remove a member
rs.remove("server4:27017")JavaScriptMember Configuration Options
Priority
- Determines likelihood of becoming primary (0-1000)
- Priority 0 = never becomes primary
- Higher priority = more likely to be elected
Votes
- Determines voting in elections (0 or 1)
- Maximum 7 voting members per replica set
- Non-voting members can hold data
Hidden Members
- Not visible to application
- Cannot become primary
- Good for backups and reporting
// Configure hidden member
cfg = rs.conf()
cfg.members[2].hidden = true
cfg.members[2].priority = 0
rs.reconfig(cfg)JavaScriptDelayed Members
- Maintain historical snapshot of data
- Useful for protection against human error
// Configure delayed member (1 hour delay)
cfg = rs.conf()
cfg.members[2].slaveDelay = 3600
cfg.members[2].priority = 0
cfg.members[2].hidden = true
rs.reconfig(cfg)JavaScriptTopology Examples
Standard Three-Node Setup
graph TB
subgraph "Standard Replica Set"
P[PrimaryPriority: 1Votes: 1]
S1[SecondaryPriority: 1Votes: 1]
S2[SecondaryPriority: 1Votes: 1]
P --> S1
P --> S2
endFive-Node with Arbiter
graph TB
subgraph "Extended Replica Set"
P[PrimaryPriority: 2Votes: 1]
S1[SecondaryPriority: 1Votes: 1]
S2[SecondaryPriority: 1Votes: 1]
S3[SecondaryPriority: 0Votes: 1Hidden]
A[ArbiterVotes: 1]
P --> S1
P --> S2
P --> S3
P -.-> A
end5. Read and Write Operations
Write Operations
All write operations go to the primary:
// Write operations always go to primary
db.users.insertOne({name: "John", email: "john@example.com"})
db.users.updateOne({name: "John"}, {$set: {age: 30}})
db.users.deleteOne({name: "John"})JavaScriptWrite Concerns
Control acknowledgment of write operations:
graph LR
subgraph "Write Concern Levels"
A[w: 1Primary Only]
B[w: 2Primary + 1 Secondary]
C[w: majorityMajority of Nodes]
D[w: 0No Acknowledgment]
end// Write concern examples
db.users.insertOne(
{name: "Alice"},
{writeConcern: {w: "majority", wtimeout: 5000}}
)
db.users.insertOne(
{name: "Bob"},
{writeConcern: {w: 2, j: true, wtimeout: 3000}}
)JavaScriptRead Operations and Read Preferences
Read Preference Modes
graph TB
subgraph "Read Preferences"
A[primaryDefault - Primary Only]
B[primaryPreferredPrimary, then Secondary]
C[secondarySecondary Only]
D[secondaryPreferredSecondary, then Primary]
E[nearestLowest Network Latency]
end// Set read preference
db.users.find().readPref("secondary")
db.users.find().readPref("primaryPreferred")
db.users.find().readPref("nearest", [{datacenter: "east"}])
// With MongoDB driver
const collection = db.collection('users');
const result = await collection.find({}, {
readPreference: 'secondaryPreferred'
}).toArray();JavaScriptRead Concern
Control consistency and isolation properties:
// Read concern levels
db.users.find().readConcern("local") // Default
db.users.find().readConcern("available") // No guarantee
db.users.find().readConcern("majority") // Majority committed
db.users.find().readConcern("linearizable") // Linearizable readsJavaScriptConnection Strings
// Connection string with read preference
mongodb://server1:27017,server2:27017,server3:27017/mydb?replicaSet=myReplicaSet&readPreference=secondaryPreferred
// With write and read concerns
mongodb://server1:27017,server2:27017,server3:27017/mydb?replicaSet=myReplicaSet&w=majority&readConcernLevel=majorityJavaScript6. Elections and Failover
Election Process
sequenceDiagram
participant P as Primary
participant S1 as Secondary1
participant S2 as Secondary2
participant S3 as Secondary3
Note over P,S3: Normal Operation
P->>S1: Heartbeat
P->>S2: Heartbeat
P->>S3: Heartbeat
Note over P,S3: Primary Fails
P--xS1: No Heartbeat
P--xS2: No Heartbeat
P--xS3: No Heartbeat
Note over S1,S3: Election Starts
S1->>S2: Vote Request
S1->>S3: Vote Request
S2->>S1: Vote Response
S3->>S1: Vote Response
Note over S1,S3: S1 Becomes Primary
S1->>S2: I am Primary
S1->>S3: I am PrimaryElection Triggers
- Primary becomes unreachable
- Primary steps down voluntarily
- Network partition
- Configuration changes
Election Factors
Priority
- Higher priority nodes more likely to be elected
- Priority 0 nodes cannot become primary
Oplog Position
- Nodes with more recent data preferred
- Prevents data loss during election
Connectivity
- Node must be able to reach majority of voting members
Failover Timeline
gantt
title Replica Set Failover Timeline
dateFormat YYYY-MM-DD
axisFormat %s
section Detection
Heartbeat Timeout :a1, 2023-01-01, 10s
section Election
Vote Request :a2, after a1, 15s
Vote Collection :a3, after a2, 10s
Primary Declaration :a4, after a3, 5s
section Recovery
Catch-up Period :a5, after a4, 15s
Normal Operation :a6, after a5, 15s
Managing Elections
// Force an election (step down primary)
rs.stepDown(60) // Step down for 60 seconds
// Check election metrics
rs.status().members.forEach(function(member) {
print(member.name + ": " + member.state)
})
// Freeze a node (prevent it from becoming primary)
rs.freeze(120) // Freeze for 120 secondsJavaScriptSplit-Brain Prevention
MongoDB prevents split-brain scenarios through majority voting:
graph TB
subgraph "Network Partition Scenario"
subgraph "Partition A"
P[Primary]
S1[Secondary]
end
subgraph "Partition B"
S2[Secondary]
S3[Secondary]
A[Arbiter]
end
P -.->|Network Down| S2
end
subgraph "Result"
P2[Primary Steps DownNo Majority]
S22[New Primary ElectedHas Majority]
end7. Monitoring and Maintenance
Basic Monitoring Commands
// Replica set status
rs.status()
// Replication lag
db.printReplicationInfo()
db.printSlaveReplicationInfo()
// Current operations
db.currentOp()
// Server status
db.serverStatus().replJavaScriptKey Metrics to Monitor
Replication Lag
// Check replication lag
db.runCommand({replSetGetStatus: 1}).members.forEach(function(member) {
if (member.state === 2) { // Secondary
print(member.name + " lag: " +
(new Date() - member.optimeDate) / 1000 + " seconds")
}
})JavaScriptOplog Size and Utilization
// Check oplog stats
db.oplog.rs.stats()
// Oplog size in GB
db.oplog.rs.stats().maxSize / (1024*1024*1024)
// Time range covered by oplog
db.printReplicationInfo()JavaScriptMonitoring Dashboard Structure
graph TB
subgraph "MongoDB Monitoring Dashboard"
A[Replica Set Health]
B[Replication Lag]
C[Oplog Utilization]
D[Election Events]
E[Connection Pool]
F[Read/Write Distribution]
A --> A1[Primary Status]
A --> A2[Secondary Count]
A --> A3[Heartbeat Status]
B --> B1[Max Lag Time]
B --> B2[Lag by Node]
B --> B3[Lag Trends]
C --> C1[Oplog Size]
C --> C2[Oplog Window]
C --> C3[Growth Rate]
endSetting Up Monitoring
Using MongoDB Ops Manager
// Enable profiling for monitoring
db.setProfilingLevel(1, {slowms: 100})
// Monitor specific operations
db.system.profile.find().limit(5).sort({ts: -1}).pretty()JavaScriptCustom Monitoring Script
// monitoring.js
function checkReplicaSetHealth() {
const status = rs.status()
const health = {
setName: status.set,
primary: null,
secondaries: [],
arbiters: [],
maxLag: 0
}
status.members.forEach(function(member) {
if (member.state === 1) {
health.primary = member.name
} else if (member.state === 2) {
health.secondaries.push({
name: member.name,
lag: (new Date() - member.optimeDate) / 1000
})
health.maxLag = Math.max(health.maxLag,
(new Date() - member.optimeDate) / 1000)
} else if (member.state === 7) {
health.arbiters.push(member.name)
}
})
return health
}
// Run monitoring
const health = checkReplicaSetHealth()
print(JSON.stringify(health, null, 2))JavaScriptLog Analysis
Important Log Patterns
# Election events
grep "election" /var/log/mongodb/mongod.log
# Replication lag warnings
grep "replication lag" /var/log/mongodb/mongod.log
# Connection issues
grep "connection" /var/log/mongodb/mongod.log
# Oplog issues
grep "oplog" /var/log/mongodb/mongod.logBash8. Advanced Topics
Chained Replication
graph TB
subgraph "Chained Replication"
P[PrimaryData Center A]
S1[Secondary 1Data Center A]
S2[Secondary 2Data Center B]
S3[Secondary 3Data Center C]
P --> S1
S1 --> S2
S2 --> S3
style P fill:#e1f5fe
style S1 fill:#f3e5f5
style S2 fill:#f3e5f5
style S3 fill:#f3e5f5
endConfiguring Chained Replication
// Allow chaining (default: true)
cfg = rs.conf()
cfg.settings = cfg.settings || {}
cfg.settings.chainingAllowed = true
rs.reconfig(cfg)JavaScriptMulti-Data Center Deployment
graph TB
subgraph "Multi-DC Replica Set"
subgraph "DC1 - Primary"
P[PrimaryPriority: 2]
S1[SecondaryPriority: 1]
end
subgraph "DC2 - Secondary"
S2[SecondaryPriority: 1]
S3[SecondaryPriority: 1]
end
subgraph "DC3 - Arbiter"
A[ArbiterTie Breaker]
end
P --> S1
P -.-> S2
P -.-> S3
P -.-> A
endConfiguration for Multi-DC
rs.initiate({
_id: "multiDCSet",
members: [
{ _id: 0, host: "dc1-server1:27017", priority: 2 },
{ _id: 1, host: "dc1-server2:27017", priority: 1 },
{ _id: 2, host: "dc2-server1:27017", priority: 1 },
{ _id: 3, host: "dc2-server2:27017", priority: 1 },
{ _id: 4, host: "dc3-arbiter:27017", arbiterOnly: true }
]
})JavaScriptTag-Based Read Preferences
// Configure tags
cfg = rs.conf()
cfg.members[0].tags = {datacenter: "east", rack: "1"}
cfg.members[1].tags = {datacenter: "east", rack: "2"}
cfg.members[2].tags = {datacenter: "west", rack: "1"}
rs.reconfig(cfg)
// Use tagged read preference
db.users.find().readPref("nearest", [{datacenter: "east"}])
db.users.find().readPref("secondary", [{rack: "1"}])JavaScriptWrite Concern with Tags
// Configure tag-based write concern
cfg = rs.conf()
cfg.settings = {
getLastErrorModes: {
multiDC: {datacenter: 2},
allRacks: {rack: 3}
}
}
rs.reconfig(cfg)
// Use tagged write concern
db.users.insertOne(
{name: "Critical Data"},
{writeConcern: {w: "multiDC", wtimeout: 5000}}
)JavaScriptReplica Set Maintenance
Rolling Maintenance
sequenceDiagram
participant P as Primary
participant S1 as Secondary1
participant S2 as Secondary2
Note over P,S2: Step 1: Maintain Secondary1
P->>S1: Shutdown for maintenance
S1-->>P: Offline
P->>S2: Continue replication
Note over P,S2: Step 2: S1 Back Online
S1->>P: Reconnect and sync
P->>S1: Catch up replication
Note over P,S2: Step 3: Maintain Secondary2
P->>S2: Shutdown for maintenance
S2-->>P: Offline
P->>S1: Continue replication
Note over P,S2: Step 4: Step Down Primary
P->>S1: Step down
S1->>P: Become Primary
P->>P: Maintenance modeMaintenance Procedures
// 1. Perform maintenance on secondaries first
rs.status() // Identify secondaries
// 2. For each secondary:
// - Stop MongoDB process
// - Perform maintenance (OS updates, hardware, etc.)
// - Restart MongoDB
// - Wait for catch-up
// 3. Step down primary
rs.stepDown(60)
// 4. Perform maintenance on former primary
// 5. Restart and let it rejoin as secondaryJavaScriptBackup Strategies
Backup from Secondary
# Create backup from secondary to avoid impacting primary
mongodump --host secondary1:27017 --oplog --out /backup/mongodb/$(date +%Y%m%d)
# Point-in-time backup
mongodump --host secondary1:27017 --oplog --query '{"timestamp": {"$lt": {"$timestamp": {"t": 1609459200, "i": 1}}}}'BashDelayed Member for Backup
// Configure delayed member for backup protection
cfg = rs.conf()
cfg.members[3] = {
_id: 3,
host: "backup-server:27017",
priority: 0,
hidden: true,
slaveDelay: 7200, // 2 hours delay
votes: 0
}
rs.reconfig(cfg)JavaScript9. Troubleshooting
Common Issues and Solutions
1. Replication Lag
Symptoms:
- High lag reported in
rs.status() - Delayed data on secondaries
Diagnosis:
// Check replication lag
db.printSlaveReplicationInfo()
// Check oplog window
db.printReplicationInfo()
// Monitor oplog growth
db.oplog.rs.find().sort({$natural: -1}).limit(1)JavaScriptSolutions:
graph TB
A[Replication Lag Detected] --> B{Check Network}
B -->|Network OK| C{Check Secondary Load}
B -->|Network Issues| D[Fix Network Connectivity]
C -->|High Load| E[Scale Secondary Resources]
C -->|Load OK| F{Check Oplog Size}
F -->|Too Small| G[Increase Oplog Size]
F -->|Size OK| H[Check Write Patterns]2. Election Issues
Symptoms:
- Frequent elections
- No primary elected
Diagnosis:
// Check election stats
rs.status().members.forEach(function(m) {
if (m.electionTime) {
print(m.name + " last election: " + m.electionTime)
}
})
// Check voting configuration
rs.conf().members.forEach(function(m) {
print(m.host + " votes: " + m.votes + " priority: " + m.priority)
})JavaScriptCommon Solutions:
// Fix: Network partition
// Ensure majority of nodes can communicate
// Fix: Clock skew
// Synchronize clocks across all nodes
// Fix: Priority misconfiguration
cfg = rs.conf()
cfg.members[0].priority = 2 // Give preference to specific node
rs.reconfig(cfg)JavaScript3. Oplog Issues
Problem: Oplog Too Small
// Check current oplog size
db.oplog.rs.stats().maxSize / (1024*1024*1024) // Size in GB
// Resize oplog (MongoDB 3.6+)
db.adminCommand({replSetResizeOplog: 1, size: 10240}) // 10GBJavaScriptProblem: Oplog Overflow
// Monitor oplog utilization
function checkOplogUtilization() {
const stats = db.oplog.rs.stats()
const oldest = db.oplog.rs.find().sort({$natural: 1}).limit(1).next()
const newest = db.oplog.rs.find().sort({$natural: -1}).limit(1).next()
const window = newest.ts.getTime() - oldest.ts.getTime()
const hours = window / (1000 * 60 * 60)
print("Oplog window: " + hours + " hours")
print("Oplog size: " + (stats.maxSize / 1024 / 1024 / 1024) + " GB")
}JavaScript4. Connection Issues
Diagnosis:
// Check current connections
db.serverStatus().connections
// Monitor connection pool
db.runCommand({connPoolStats: 1})
// Check for connection errors in logsJavaScriptSolutions:
# Increase connection limits in mongod.conf
net:
maxIncomingConnections: 20000
# Connection string optimization
mongodb://server1:27017,server2:27017,server3:27017/mydb?
replicaSet=myReplicaSet&
maxPoolSize=100&
minPoolSize=10&
maxIdleTimeMS=30000INIDiagnostic Commands Reference
// Comprehensive health check
function healthCheck() {
print("=== Replica Set Health Check ===")
// Basic status
print("\n1. Replica Set Status:")
const status = rs.status()
print("Set: " + status.set)
print("Primary: " + status.members.find(m => m.state === 1)?.name || "None")
// Member states
print("\n2. Member States:")
status.members.forEach(m => {
print(m.name + ": " + m.stateStr + " (lag: " +
((new Date() - m.optimeDate) / 1000) + "s)")
})
// Oplog info
print("\n3. Oplog Information:")
db.printReplicationInfo()
// Connection status
print("\n4. Connections:")
const connStats = db.serverStatus().connections
print("Current: " + connStats.current + "/" + connStats.available)
return status
}
// Run health check
healthCheck()JavaScriptRecovery Procedures
Recovering from Data Corruption
graph TB
A[Detect Corruption] --> B[Isolate Affected Node]
B --> C[Stop MongoDB Process]
C --> D{Data Recoverable?}
D -->|Yes| E[Repair Database]
D -->|No| F[Remove from Replica Set]
E --> G[Restart and Resync]
F --> H[Fresh Install]
G --> I[Monitor Health]
H --> I// Remove corrupted member
rs.remove("corrupted-server:27017")
// After fixing, re-add
rs.add("fixed-server:27017")JavaScriptInitial Sync Issues
// Force resync of a member
// 1. Stop MongoDB on the problematic secondary
// 2. Remove data directory
// 3. Restart MongoDB - it will perform initial sync
// Monitor initial sync progress
db.serverStatus().initialSyncJavaScript10. Best Practices
Deployment Best Practices
Hardware Recommendations
graph TB
subgraph "Production Deployment"
subgraph "Primary Node"
P1[High-performance SSD]
P2[Adequate RAMWorking Set + OS]
P3[Fast Network]
P4[Redundant Power]
end
subgraph "Secondary Nodes"
S1[Similar Hardware to Primary]
S2[Geographically Distributed]
S3[Dedicated Networks]
end
subgraph "Monitoring"
M1[Centralized Logging]
M2[Metrics Collection]
M3[Alerting System]
end
endNetwork Configuration
# Firewall rules (example for iptables)
# Allow MongoDB port between replica set members
iptables -A INPUT -p tcp -s 10.0.1.0/24 --dport 27017 -j ACCEPT
# Security group (AWS example)
# Source: Security group ID of replica set members
# Port: 27017
# Protocol: TCPBashOperating System Tuning
# Disable Transparent Huge Pages
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
# Set appropriate ulimits
echo "mongodb soft nofile 64000" >> /etc/security/limits.conf
echo "mongodb hard nofile 64000" >> /etc/security/limits.conf
echo "mongodb soft nproc 32000" >> /etc/security/limits.conf
echo "mongodb hard nproc 32000" >> /etc/security/limits.conf
# Configure swappiness
echo 'vm.swappiness = 1' >> /etc/sysctl.confBashSecurity Best Practices
Authentication and Authorization
// Create admin user
use admin
db.createUser({
user: "admin",
pwd: "securePassword",
roles: ["userAdminAnyDatabase", "dbAdminAnyDatabase", "readWriteAnyDatabase"]
})
// Create replica set user
db.createUser({
user: "replicaSetUser",
pwd: "replicaPassword",
roles: ["clusterAdmin"]
})JavaScriptEnable Authentication
# mongod.conf
security:
authorization: enabled
keyFile: /etc/mongodb-keyfile
# Create keyfile
openssl rand -base64 756 > /etc/mongodb-keyfile
chmod 400 /etc/mongodb-keyfile
chown mongodb:mongodb /etc/mongodb-keyfileBashSSL/TLS Configuration
# mongod.conf
net:
ssl:
mode: requireSSL
PEMKeyFile: /etc/ssl/mongodb.pem
CAFile: /etc/ssl/ca.pem
allowConnectionsWithoutCertificates: falseINIPerformance Best Practices
Write Concern Strategy
graph LR
subgraph "Write Concern Selection"
A[Application Type] --> B{Consistency Needs}
B -->|High| C[w: majority]
B -->|Medium| D[w: 2]
B -->|Low| E[w: 1]
F[Performance Needs] --> G{Latency Tolerance}
G -->|Low| H[w: 1, j: false]
G -->|Medium| I[w: majority, j: true]
G -->|High| J[w: all, j: true]
endRead Preference Strategy
// Application patterns
const strategies = {
// Real-time dashboard - need latest data
realTime: { readPreference: 'primary' },
// Analytics - can tolerate slight lag
analytics: { readPreference: 'secondaryPreferred' },
// Reports - distribute load
reports: { readPreference: 'secondary' },
// Global app - use nearest
global: { readPreference: 'nearest' }
}JavaScriptIndex Strategy for Replica Sets
// Create indexes on primary - automatically replicated
db.users.createIndex({email: 1}, {unique: true})
db.orders.createIndex({customerId: 1, orderDate: -1})
// Background index creation (less blocking)
db.products.createIndex({category: 1, price: -1}, {background: true})
// Partial indexes for efficiency
db.users.createIndex(
{email: 1},
{partialFilterExpression: {email: {$exists: true}}}
)JavaScriptMonitoring and Alerting
Key Metrics to Monitor
// Monitoring script template
const monitoringChecks = {
replicationLag: function() {
const status = rs.status()
const primary = status.members.find(m => m.state === 1)
const maxLag = Math.max(...status.members
.filter(m => m.state === 2)
.map(m => (new Date() - m.optimeDate) / 1000))
return {
metric: 'replication_lag_seconds',
value: maxLag,
threshold: 30, // Alert if > 30 seconds
status: maxLag > 30 ? 'CRITICAL' : 'OK'
}
},
oplogWindow: function() {
const stats = db.oplog.rs.stats()
const oldest = db.oplog.rs.find().sort({$natural: 1}).limit(1).next()
const newest = db.oplog.rs.find().sort({$natural: -1}).limit(1).next()
const hours = (newest.ts.getTime() - oldest.ts.getTime()) / (1000 * 60 * 60)
return {
metric: 'oplog_window_hours',
value: hours,
threshold: 24, // Alert if < 24 hours
status: hours < 24 ? 'WARNING' : 'OK'
}
},
primaryStatus: function() {
const status = rs.status()
const hasPrimary = status.members.some(m => m.state === 1)
return {
metric: 'has_primary',
value: hasPrimary ? 1 : 0,
threshold: 1,
status: hasPrimary ? 'OK' : 'CRITICAL'
}
}
}JavaScriptAlerting Rules
# Example Prometheus alerting rules
groups:
- name: mongodb.rules
rules:
- alert: MongoDBReplicationLag
expr: mongodb_replication_lag_seconds > 30
for: 5m
labels:
severity: critical
annotations:
summary: "MongoDB replication lag is high"
- alert: MongoDBNoPrimary
expr: mongodb_replica_set_primary_count == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MongoDB replica set has no primary"
- alert: MongoDBOplogWindow
expr: mongodb_oplog_window_hours < 24
for: 10m
labels:
severity: warning
annotations:
summary: "MongoDB oplog window is getting small"YAMLCapacity Planning
Growth Estimation
// Capacity planning script
function capacityPlanning() {
const stats = db.stats()
const collections = db.runCommand("listCollections").cursor.firstBatch
const analysis = {
currentSize: stats.dataSize / (1024*1024*1024), // GB
indexSize: stats.indexSize / (1024*1024*1024), // GB
avgDocSize: stats.avgObjSize,
collections: collections.length
}
// Project growth (example: 20% monthly)
const monthlyGrowth = 1.20
const months = 12
analysis.projectedSize = analysis.currentSize * Math.pow(monthlyGrowth, months)
analysis.recommendedStorage = analysis.projectedSize * 2 // 100% buffer
return analysis
}JavaScriptResource Scaling Guidelines
graph TB
subgraph "Scaling Decision Tree"
A[Performance Issues?] --> B{CPU Bound?}
A --> C{Memory Bound?}
A --> D{Disk I/O Bound?}
A --> E{Network Bound?}
B -->|Yes| F[Scale CPU Verticallyor Add Read Replicas]
C -->|Yes| G[Add RAM orOptimize Queries]
D -->|Yes| H[Upgrade to SSD orAdd More Secondaries]
E -->|Yes| I[Upgrade Network orOptimize Connection Pool]
endDisaster Recovery
Backup Strategy
graph TB
subgraph "Backup Strategy"
A[Daily Full Backup] --> B[Continuous Oplog Backup]
B --> C[Point-in-Time Recovery]
D[Geographic Distribution] --> E[Cross-Region Replication]
E --> F[Disaster Recovery Site]
G[Testing] --> H[Monthly Restore Tests]
H --> I[Documented Procedures]
endRecovery Procedures
// Disaster recovery runbook
const recoveryProcedures = {
totalLoss: [
"1. Restore from latest backup",
"2. Replay oplog entries",
"3. Validate data integrity",
"4. Rebuild replica set",
"5. Update application connection strings"
],
primaryLoss: [
"1. Verify secondary promotion",
"2. Update application if needed",
"3. Rebuild failed primary",
"4. Add back to replica set"
],
majorityLoss: [
"1. Restore from backup to new servers",
"2. Reconfigure replica set",
"3. Force reconfiguration if needed",
"4. Validate application connectivity"
]
}JavaScriptConclusion
MongoDB replication provides robust high availability and data protection through replica sets. Key takeaways:
- Always deploy in odd numbers (3, 5, 7 members) to ensure clear majorities
- Monitor replication lag and oplog window continuously
- Use appropriate write and read concerns for your consistency needs
- Plan for failure scenarios and practice recovery procedures
- Implement comprehensive monitoring and alerting
- Follow security best practices including authentication and encryption
- Regular maintenance and capacity planning are essential
By following these practices and understanding the concepts in this guide, you’ll be able to successfully deploy and manage MongoDB replica sets from development through enterprise production environments.
Additional Resources
This guide covers MongoDB replication comprehensively. For the latest features and updates, always refer to the official MongoDB documentation.
Discover more from Altgr Blog
Subscribe to get the latest posts sent to your email.
