I am adding the complete process on how to use oplogs as an incremental backup for MongoDB.
Although it is a little hacky way as there is no direct solution to this issue (like we have bin logs in PostgreSQL).
Oplogs vs Journal
Journals are low-level logs that are used for crash recovery on the same node (get more insight on this topic here)
Oplogs are capped collections that contain instructions that modify the database. They are used in Master-Slave setup, where Slave pings master and updates own data files using Master’s oplogs (read more here)
Step 1 – Setup Master-Slave architecture :
– Stop all running MongoDB instances.
– Start new MongoDB server as primary node using command: mongod --dbpath <path-to-db> --replSet rs0 --port 27017 --fork --logpath <path-to-logs>
– Similarly start secondary node: mongod --dbpath <path-to-secondary-db> --replSet rs0 --port 27018 --fork --logpath <path-to-secondary-logs>
– Now start an arbiter node: mongod --dbpath <path-to-arbiter-db> --replSet rs0 --port 27019 --fork --logpath <path-to-arbiter-logs>
Understanding above commands:
–dbpath: Specify the location where you want DB data and mongo instance data to be stored
–replSet: Specify a name for the Replica Set. (Note that in all 3 commands value for –replSet is same rs0
, because all three nodes belong to same Replica Set or Master-Slave configuration)
–port: Specify a port number here.
–fork: Run this instance in background
–logpath: Specify the location of the file where you want logs for this instance to be printed.
Note that at this point of time we have three instances of MongoDB running on three different ports with their DB and log paths located at separate locations.
Also, MongoDB does not know yet which one is primary, secondary or arbiter. We will be defining these in next step, but before that we should know what is Arbiter node and its role.
Arbiter: Do not store any database specific data and is only used for tie-breaking while electing new primary when the Master node dies [used in cases when there are even number of nodes.]
Step 2:
– On your server machine open mongo console
– Type following piece of code in it
cfg = {
"_id" : "rs0",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "127.0.0.1:27017"
}
]
}
rs.initiate();
rs.add("127.0.0.1:27018")
rs.addArb("127.0.0.1:27019")
rs.status();
(Off-course, you can also add a lot of other parameters to config like the priority of nodes.)
Now above config will make MongoDB instance running on
- port 27017 as Primary Node.
- port 27018 as Secondary Node.
- port 27019 as Arbiter Node.
Step 3:
- Well, there is no step 3 as you are all done.
- All Secondaries will ping for oplog update from primary and update their data files whenever any change occurs.
- Thus whenever Primary node becomes unavailable, the Secondary node will be elected as the new Primary node for the time and will serve the part.
- It is better to keep Primary node and Secondary node on different machines to avoid centralized loss of data.
[avatar user=”Adil Ansar” size=”thumbnail” align=”left” /]
By – Adil Ansar, Team Lead