For better understanding I have divided this post in two parts :
– In first part I will try to explain Replica set and how it works.
– Secondly I will explain ways to manage storage space.
MongoDB is an open-source document database (NoSQL) that provides high performance, high availability, and automatic scaling. MongoDB is written in C++.
A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents
Key features :
- High performance data persistence.
- Rich query language to support read and write operations ( CRUD) as well as text search and data aggregation
- High availability through replication feature called replica set.
- Horizontal scalability through Sharding feature which distribute data across a cluster of machine.
Replication feature in MongoDB
Consider an App whose data is present on a single mongoDB server and server suddenly crashes leading to corruption of entire data on that server due to server outage. In this situation our entire application data is lost and business will end up nowhere. Therefore every organization needs to keep at least one copy of data on another server which is possible through Replication feature of MongoDB. Therefore with the help of data Replication it is possible to create additional copies of the data, which we can use for continuity of business environment or disaster recovery or backup.
Advantages of Data Replication :
- Best practice to keep business crucial data safe.
- Redundancy and increases data availability ( 24*7)
- Helps additional copies of the data, which can be used for continuity of business environment or disaster recovery or backup.
- Replication can provide increased read capacity as clients can send read operations to different servers.
- No need for downtime in case of maintenance such as data backups, data compaction and index rebuilding activities.
Replication Process in MongoDB
Replication in MongoDb can be achieved through replica set. A replica set in MongoDB is a group of mongod processes that maintain the same data set. Hence a replica set contains several data bearing nodes (mongod processes) and optionally one arbiter node. Of the data bearing nodes, one and only one member is deemed the primary node (Master), while the other nodes are deemed secondary nodes (Slaves).
The minimum recommended configuration for a replica set is a three member replica set with three data-bearing members: one primary and two secondary members.
Members of replica set
- Primary (Master) : The primary is the only member in the replica set that receives all write operations. MongoDB applies write operations on the primary and then records the operations on the primary’s oplog (Operation Log).
All members of the replica set can accept read operations. However, by default, an application directs its read operations to the primary member.
- Secondary (Slaves) : A secondary maintains a copy of the primary’s data set. To replicate data, a secondary applies operations from the primary’s oplog in asynchronous process such that the secondaries’ data sets reflect the primary’s data set. A replica set can have one or more secondaries.
- Arbitor : You may add an extra mongod instance to a replica set as an arbiter. Arbiters do not maintain a data set. The purpose of an arbiter is to maintain a quorum in a replica set by responding to heartbeat and election requests by other replica set members.
Because they do not store a data set, arbiters can be a good way to provide replica set quorum functionality with a cheaper resource cost than a fully functional replica set member with a data set.
Only add an arbiter to sets with even numbers of voting members. If you add an arbiter to a set with an odd number of voting members, the set may suffer from tied elections.
Automatic failover in Replication
There might be situation when primary becomes inaccessible. When a primary does not communicate with the other members of the set for more than 10 seconds, an eligible secondary will hold an election to elect itself the new primary.
The first secondary to hold an election and receive a majority of the members’ votes becomes primary.
Although the timing varies, the failover process generally completes within a minute. For instance, it may take 10-30 seconds for the members of a replica set to declare a primary inaccessible. One of the remaining secondaries holds an election to elect itself as a new primary. The election itself may take another 10-30 seconds.
While an election is in process, the replica set has no primary and cannot accept writes and all remaining members become read-only.
Managing Disk Space in MongoDB
When documents or collections are deleted, empty record blocks within data files arise. MongoDB attempts to reuse this space when possible, but it will never return this space to the file system. This behavior explains why fileSize never decreases despite deletes on a database.
Let’s say I have 20GB of data in a MongoDB database, and I delete 5GB of that data. Even though that 5GB of data is deleted and only 15GB of data is actually present in database, unused 5GB will not be released to the OS. MongoDb will keep on holding the entire 20GB Disk space it had earlier, so that it can use the same space to accommodate new data. This used disk space will keep on increasing and will never be released.
There will be situations where we don’t want to let MongoDB to keep hogging all disk space to itself. Depending on setup and the storage engine used for MongoDB, we have a couple of choices.
1. Compacting individual collections:
Compact command can be used to compact individual collections. This command rewrites and defragments all data in a collection, as well as all of the indexes on that collection.
Important : This operation blocks all other database activity when running and should be used only when downtime for database is acceptable. If running a replica set, we can perform compaction on secondaries in order to avoid blocking the primary and use failover to make the primary a secondary before compacting it.
The compact command works at the collection level, so each collection in database will have to be compacted one by one. This completely rewrites the data and indexes to remove fragmentation. In addition, if storage engine is WiredTiger, the compact command will also release unused disk space back to the system. If storage engine is the older MMAPv1 though, it will still rewrite the collection, but it will not release the unused disk space. Running the compact command places a block on all other operations at the database level, so we have to plan for some downtime.
2. Repair (Compacting one or more databases)
For a single-node MongoDB deployment, we can use the db.repairDatabase() command to compact all the collections in the database. This operation rewrites all the data and indexes for each collection in the database from scratch and thereby compacts and defragments the entire database.
To compact all the databases on server process, we can stop our mongod process and run it with the “–repair” option.
- This operation blocks all other database activity when running and should be used only when downtime for database is acceptable.
- Running a repair requires free disk space equal to the size of current data set plus 2 GB. We can use space in a different volume than the one that mongod is running in by specifying the “–repairpath” option.
3. Compacting all databases on a server by re-syncing replica set nodes
For a multi-node MongoDB deployment (Replica set), we can resync a secondary from scratch to reclaim space. By resyncing each node in replica set we can effectively rewrite the data files from scratch and thereby defragment database.
Please note that if cluster is comprised of only two electable nodes, we will sacrifice high availability during the resync because the secondary is completely wiped before syncing.
MongoDB Replica Set : Database size difference between Primary and Secondary Node
You could face a scenario where Secondary node database is of higher size than the primary one. Both nodes could have the same number of objects, but the values of “avgObjSize”, “dataSize”, “storageSize” are higher for secondary node. There could be no replication lag as well, when checked from rs.stats() command.
What could be the reason for this?
dataSize() : It returns the the size of the collection.
avgObjSize() : It returns the average size of an object in the collection.
storageSize() : It returns the total amount of storage allocated to this collection for document storage.
Reason : Because of different amount of not reclaimed memory space on secondary and primary
Suppose we have a replica set with one primary and one secondary node. Lets say this primary has been primary node from a long time where some documents were deleted and inserted, but no compact operation was run. Free space in primary would not be reclaimed by OS, and would be counted in dataSize, avgObjSize and storageSize.
Secondary node could be fully resynced from primary, but only operations from current oplog would be replayed in it. In this case secondary could have lower values for dataSize, avgObjSize and storageSize.
If after that secondary is elected as primary, you would see described difference in sizes. Both nodes would have the same number of objects, but the values of “avgObjSize”, “dataSize”, “storageSize” would be higher for secondary node.
By – Shalabh Tripathi, Software Engineer