Mongodb is a schema less, Not only SQL database system. A schema less design allows for rapid prototyping. Often it pushes you into some dilemmas, most common of them – “To embed … or not to embed”.
After searching on sites like stackoverflow, reddit, quora you will come across number of opinions (which would often be at odd with each other.). At genesis stage of your application, it won’t really matter whether you embed documents or reference them, but in the long run, it will surely will and could blow up a couple of things.
Coming back, I will list down some points to consider
A suggestion I came across frequently : Embed related data, for eg, A User document can embed all Posts and Comments documents. A scenario where a user has to be fetched along with its posts, could be benefitted by such modelling, but i would suggest not to go for such modelling. Reason being, Mongodb enforces a limit to max size of a document (16mb). Any set of record which is to grow forever, seems to me as a bad candidate for embedded documents. To deal with docs above 16Mb, you will need GridFs (which could be an unnecessary overhead).
A different argument would be that 16 mb would be a hell lot of textual data. So, entirely depends on your project.
A scenario which would involve time consuming joins is a good scenario for embedding.
Data model lifecycle: think about the life cycle of the container document and its content: make sense that child documents will still have to exist when the parent document is deleted? If the answer is “no” nested is the way.
Memory Considerations: Embedding large documents and fetching them can bloat your application server (the documents can occupy all available memory) causing the application server to get killed or become unresponsive.
Another consideration could be the kind of aggregation you wish to perform on your dataset. Also if you wish to perform a map reduce, you would model in a different manner. (details will be covered in a future article.)
Data Consistency : This could be an important point to consider. If you wish to implement atomic (transactional updates), read carefully. MongoDB makes a trade off between efficiency and consistency. The rule is changes to a single document are always atomic, while updates to multiple documents should never be assumed to be atomic. There is also no way to “lock” a record on the server (you can build this into the client’s logic using for example a “lock” field). When you design your schema consider how you will keep your data consistent. Generally, the more that you keep in a document the better.
Older versions of Mongodb did not support an operator for join operations. Mongodb 3.2 introduced an operator for left outer joins, $lookup. https://docs.mongodb.com/v3.2/reference/operator/aggregation/lookup/
Note: This only works for unsharded collections.
Also refer to http://docs.mongodb.org/manual/core/data-model-design/
I will keep updating my findings. Stay tuned for more on “To embed or not to embed – Mongodb”.
By – Saurabh Mehta, Senior Software Engineer