MongoDB (Single-Server) Data Durability Guide

If you are a MongoDB user or just interested in NoSQL databases in general, you may have seen the excellent “MongoDB has poor data durability by default!” (I am paraphrasing) conversation started by Mikeal Rogers.

It is an excellent topic to bring up and regardless of Mikeal’s association (he’s a CouchDB developer) I never once got the impression that he was setting up a straw man to knock it back down; he brought up some really excellent points about how MongoDB favors performance over single-server data durability by default which in turn spurred a great conversation between him, the MongDB/10gen team and some users from the community.

In this guide I want to outline a number of ways you can increase your single-server durability using MongoDB as well as some optional safer driver behaviors you can make use of to ensure fsync’ing as well as synchronous writes and lastly a tip on keeping your data more aggressively synced across all the nodes in your MongoDB cluster.

REMINDER: Before getting started, please note that any use of ‘fsync’ syncs allpending changes and not just the last change made.

MongoDB Durability Overview
fsync’ing MongoDB – from the command line
fsync’ing MongoDB – from the command shell
fsync’ing MongoDB – from a database driver
MongoDB Synchronous Writes – getLastError
MongoDB Synchronous Writes – Write Concern
BONUS: “fsync’ing” MongoDB (1.5+) – across multiple servers
HELP: How can I fixed a corrupted MongoDB database?
Conclusion
MongoDB 1.4 Command Line Reference

MongoDB Durability Overview

The ensuing conversation between Mikeal and the MongoDB/10gen team, like Mike and Kristina, was hugely valuable. If this type of conversation interests you as much as it does me, you can follow this thread “How reliable has MongoDB become?” and my own thread “How is a corrupt master repaired from a slave?” for more replies on the subject.

The MongoDB data-durability highlights that have come out of these conversations so far are:

MongoDB is not designed around single-server durability, but rather multi-server durability.
For better single-server durability (at the cost of performance) you can configure MongoDB to fsync at differing intervals (we’ll show you how below).
MongoDB will add support for better single-server durability in dev stream 1.7 and released to production in 1.8 (Issue #980). At the time of this writing it isn’t clear if this will be a “transaction log” style solution as mentioned on Mikeal’s blog comments, or if it will be some other strategy. 10gen hasn’t spec’ed it out yet.
The upcoming “Replica Sets” feature in MongoDB 1.5 (dev) and eventually 1.6 (production) are intended to fully address the problem of creating a high-availability cluster (2 or more) of MongoDB instances that all work together to keep the cluster operational with things like automatic failover and data recovery when a node is brought back online. For multi-server setups, this is what you want to care about.

Mongo’s core strategy of data reliability is still focused around the multi-server setup, but it is nice to see the addition of the “transaction log” going in to the server to help with single-server resilience.

fsync’ing MongoDB – from the command line

On the topic of single-server resilience, the best thing you can currently do (until transaction log support is added) is to increase the number of times you have MongoDB fsync.

fsync‘ing is an operating-system level operation that gets data out of volatile caches and commits it to the disk. Out of the box MongoDB performs an fsync every 60 seconds. What this means for you is that in a worst-case scenario, like a power outage, your server can loose up to 59.99 seconds worth of data (NOTE: I realize in a real worst-case scenario it could be more due to disk failure, but let’s keep this example manageable).

If you are working on a write-heavy server that does thousands of writes a second, loosing 59.99 seconds worth of data could be millions of records. If you deem this too heavy of a loss, you can modify how frequently MongoDB fsync’s to disk by using the –syncdelay=SECONDS command line argument like so:

mongod --syncdelay=5

Which will force the server to flush it’s caches to disk every 20 seconds regardless of what is going on.

While this might seem like a good idea, you could imagine in a high-activity production environment, fsyncing every few seconds regardless of the operation being performed and how heavy of a load the DB is under might be a bit too heavy handed; you might want to control exactly which operations get fsync’ed to disk and which ones don’t.

Luckily, you can do that!

fsync’ing MongoDB – from the command shell

As it turns out, MongoDB also has an internal “fsync” command you can execute from the MongoDB shell like so:

use admindb.runCommand({fsync:1});

While will execute the fsync synchronously, waiting for it to complete before returning control back to the shell. If you would like to execute the fsync asynchronously which fires the command into the DB but then returns control back to the shell immediately, you can use the optional async:true argument:

use admindb.runCommand({fsync:1,async:true});

NOTE: These examples are straight out of the MongoDB fsync documentation.

This is all well and good, but what about executing these commands from a driver connected to DB instead of needing to always have the Mongo shell up?

Great news, you can do that too!

fsync’ing MongoDB – from a database driver

MongoDB’s fsync functionality can be invoked from almost all of the DB drivers supported, we will focus on the Java MongoDB driver.

Invoking the ‘fsync’ command from the MongoDB drivers is as easy as executing the “fsync:1” command. In the case of the Java driver, you can use thecom.mongodb.DB.command(DBObject) or com.mongodb.DB.command(String)methods. Using the String-based method is the easiest to write an example for and would look like:

DB conn = Mongo.connect(...);conn.command("{fsync:1}");

You could imagine in any Java web app you could write a simpler utility method that fired off a sync command that you could call after any important write operation; allowing you to dictate when the server took it’s time to block and fsync or not.

For example, maybe after user-registration you would fsync to ensure (as best as you can) that the user account is safe and user-registration occurs infrequently enough that you wouldn’t be causing that many locks on the DB to impact performance. But in the case of say users commenting on something, you would avoid fsync’ing because it happens frequently and loosing a comment or two isn’t that bad.

Now the next logical comment likely on the tip of your brain is “That’s cool, but what about synchronous writing that I know is safe on the server before I continue?”

And you know, today is your lucky day, because that’s the next section!

MongoDB Synchronous Writes – getLastError

I think we have beaten to death the different ways you can ensure MongoDB keeps its internal cache’s consistent with persisted data on disk using fsync. Parallel to the concerns of fsync’ing is the idea of synchronous writes. More specifically, by default MongoDB operates in a “write and forget” mode to callers. You perform a write operation from a MongoDB driver and the call immediately returns, with you putting your trust in MongoDB that it got the write request and will successfully service it.

MongoDB behaving like this as a default is a great idea in my opinion. Assuming, by default, that MongoDB won’t explode into flames and will service your request makes sense when you consider the performance benefits.

For more important data that cannot be write-and-forgot-ted-ded, MongoDB can optionally support synchronous writes to the DB by way of the “getLastError” command. The getLastError command does exactly what you think it would: it gets the last error that occurred for the last operation issued for that connection (it operates on a per-connection scope).

Keeping a connection to the DB open and using the getLastErrror command immediately after a write has the effect of blocking until getLastError returns a result to the caller.

Different drivers implement the support for getLastError differently; the Python driver adds support for a boolean value to write operations causing the call to block automatically until complete; the Java driver (because it pools connections) requires you to tell it when you want to keep a particular connection open until you are done with it by way of the com.mongodb.DB.requestStart() marking method and matchingcom.mongodb.DB.requestDone() method.

REMINDER: While the requestStart and requestDone methods seem like transaction delimiters, they aren’t. They are just marker methods used by the MongoDB Java driver to ensure that subsequent calls get serviced through the same pooled MongoDB connection and not cycled over to another pooled connection.

Using the Java driver as an example, to ensure that our commands go to the same connection (as required by getLastError) our code would look something like this for a synchronous write:

DB conn = Mongo.connect(...); conn.requestStart();// At this point, all commands will be issued through// the same connection from the internal Mongo driver// connection pool conn.getCollection(...).insert(...);CommandResult result = conn.getLastError(); // Handle a failure case from result argconn.requestDone(); // Now the Java driver will stop shuffling all commands// over that same connection and return it to the pool.

Again, I could imagine utility classes in a web app that provide synchronous writes to Mongo by wrapping these implementation details in a method like “DAO.saveUser(User user, boolean synchronous)” or something like that.

MongoDB Synchronous Writes – Write Concern

This tip is for the MongoDB Java Driver folks out there; I don’t know if the other drivers support this, but the examples below apply to the Java driver.

At both the database and collection level, MongoDB’s Java driver supports setting aWriteConcern value (values: NONE, NORMAL, STRICT) that indicates to the DB how it should implicitly handle write operations.

So if you need all your commands against a MongoDB source to be synchronous, you can simply set the WriteConcern on the entire DB connection to becom.mongodb.DB.WriteConcern.STRICT and not worry about handlingrequestStart() and requestDone() calls manually as mentioned above.

NOTE: If you still want to retrieve the getLastError value to process it, you will still need to use the code from the previous tip to denote to the Java driver that you want to share the same connection for all the commands so the error retrieved is appropriate for the command you issued.

You might be wondering why use WriteConcern at all if the previous tip does the same thing? Well, I think that depends on you.

If you’d prefer the driver did everything for you as far as managing your synchronicity, then use a WriteConcern setting; if you’d rather manage it via your own API calls that you can refactor and modify later, then use the previous tip.

The choice is up to you.

BONUS: “fsync’ing” MongoDB (1.5+) – across multiple servers

Our last bonus tip for this guide is not a single-server tip, but is so closely related to the topic of data durability that we decided to include it.

In the world of MongoDB it is common to have a few Mongo servers setup to run together; for example, in a Master-Slave configuration. Given that this entire guide has been about data durability, the topic of data consistency (in Mongo-land) can be the next question on your lips and we are going to give you a tip on how to manage that across multiple MongoDB nodes.

This guide so far has been talking about “fsync’ing” a single-server MongoDB’s cache to disk to persist your data (as well as synchronous writes); but, what if you wanted to effectively “fsync” your written data across multiple MongoDB nodes and not just to the disk of a single server?

Never fear, I shall tell you!

REMINDER: This tip requires MongoDB 1.5.0 or higher.

This replication acknowledgement command is actually an extension of the already-helpful getLastError command we covered above in the form of an optional argument: “w”.

Yep, just the letter “w”.

The way you use this command is to issue the getLastError command as you’ve already learned how to do, but include an additional “w” argument with a value > 2 where “w” represents the number of servers in your cluster to force the replication to, how cool is that?

An example usage would look like this:

DB conn = Mongo.connect(...); conn.requestStart(); // ensure same connection conn.getCollection(...).insert(...);conn.getCollection(...).remove(...); // Now force the write operation to complete across// at least 2 nodes before returning.conn.command("{getlasterror: 1, w: 2}"); conn.requestDone(); // release connection

This is one of the more powerful additions in MongoDB 1.5+, allowing developers to have direct control over how important data is handled by the servers. You could imagine in a cluster of 10 MongoDB servers, if a new user registers and you don’t want them to continue on into the system until that user account exists on all the servers, how handy this operation can become.

Again, wrapping this in a utility class to make using it easier is probably the way to go.

Update #1: Kristina from the MongoDB team has provided PHP and Perl code examples for this tip, thanks!

HELP: How can I fix a corrupted MongoDB database?

As part of this data-durability conversation around MongoDB, a sub-conversation around MongoDB database corruption and repair has sprung up.

Some take-aways from this sub-conversation is:

GOOD: Using CTRL-C to shutdown MongoDB is the preferred method, it produces a clean shutdown sequence (Picture by Sam Millman).
GOOD: Sending SIGINT or SIGTERM signals to the MongoDB process is the same as using CTRL-C; it produces a clean shutdown sequence.
BAD: Sending KILL or kill -9 signals to the MongoDB process will likely corrupt the database as MongoDB cannot shut down cleanly.

All that being said, in the off-chance that you have a power outage and your MongoDB database gets corrupted, the correct way to repair it is using the repair command.

You can use the repair command directly from the command line, like so:

mongod --repair

or you can execute it from inside the MongoDB shell, like so:

db.repairDatabase();

In either case, the repair operation will cycle through the disk contents and repair the corrupted portions of your file.

IMPORTANT: This can result in loss of corrupted data.

To clarify, you have no other choice. Your database is already messed up, running therepair command will get it back into a serviceable state, but that could include pruning data from your database that is corrupted. Just because you run repair doesn’t mean the world will automatically be rosy again, you have to always take 2nd and 3rd level precautions against data-loss.

To help cover anyone in a multi-server jam, in a Master/Slave setup where the Master fails and your Slave continues chugging along, you have to forget about recovering the Master and promote the Slave to the Master and then bing the old-Master back online as a Slave (switch their roles). At which point you’ll have to repair the slave to ensure the data store is in good condition, then likely issue an –autoresync command to get the Slave back up to speed, or a –fastsync if you have grabbed a disk-snapshot of the master and copied it over to the slave to use as a “starting point” for it’s snapshot.

Either way, data recovery is hairy business. That is why everyone in the MongoDB community is so excited for Replica Sets to show up in 1.6 stable. In that setup, the individual nodes are configured to fail-over between one another and automatically resync themselves either others in the cluster when they come back up; making your life a bit easier.

Conclusion

I hope you find this guide helpful. If you have any questions, spot any errors or have recommended additions for the guide, please leave a comment and I’ll take a look.

Happy (and safe) Mongo’ing!

MongoDB 1.4 Command Line Reference

Below is the full list of command line options for MongoDB 1.4 for easy reference.

General options:-h [ --help ] show this usage information--version show version information-f [ --config ] arg configuration file specifying additional options-v [ --verbose ] be more verbose (include multiple times for moreverbosity e.g. -vvvvv)--quiet quieter output--port arg specify port number--logpath arg file to send all output to instead of stdout--logappend append to logpath instead of over-writing--bind_ip arg local ip address to bind listener - all local ipsbound by default--dbpath arg (=/data/db/) directory for datafiles--directoryperdb each database will be stored in a separate directory--repairpath arg root directory for repair files - defaults to dbpath--cpu periodically show cpu and iowait utilization--noauth run without security--auth run with security--objcheck inspect client data for validity on receipt--quota enable db quota management--quotaFiles arg number of files allower per db, requires --quota--appsrvpath arg root directory for the babble app server--nocursors diagnostic/debugging option--nohints ignore query hints--nohttpinterface disable http interface--rest turn on simple rest api--noscripting disable scripting engine--noprealloc disable data file preallocation--smallfiles use a smaller default file size--nssize arg (=16) .ns file size (in MB) for new databases--diaglog arg 0=off 1=W 2=R 3=both 7=W+some reads--sysinfo print some diagnostic system information--upgrade upgrade db if needed--repair run repair on all dbs--notablescan do not allow table scans--syncdelay arg (=60) seconds between disk syncs (0 for never)--profile arg 0=off 1=slow, 2=all--slowms arg (=100) value of slow for profile and console log--maxConns arg max number of simultaneous connections--install install mongodb service--remove remove mongodb service--service start mongodb serviceReplication options:--master master mode--slave slave mode--source arg when slave: specify master as--only arg when slave: specify a single database to replicate--pairwith arg address of server to pair with--arbiter arg address of arbiter server--slavedelay arg specify delay (in seconds) to be used when applyingmaster ops to slave--fastsync indicate that this instance is starting from a dbpathsnapshot of the repl peer--autoresync automatically resync if slave data is stale--oplogSize arg size limit (in MB) for op log--opIdMem arg size limit (in bytes) for in memory storage of op idsSharding options:--configsvr declare this is a config db of a cluster--shardsvr declare this is a shard db of a cluster

hgh411

MongoDB (Single-Server) Data Durability Guide