Ventures into MongoDB - Transitioning from mySQL to noSQL

I've never really ever tried to make a proper database. I suppose my biggest venture into it was in college nearly 3 years ago now. It was a good thing to learn but there will always be one thing that I will always hate. Normalization.

For me it was rollercoaster ride of fustration. I could code! i'd started in secondary school at the age of 14 so surely I could normalize a database!

It could not of been further from the truth. Constantly I would find myself with what I thought was the correct answer. Only to then find I was wrong, I didn't even understand why! I think sometimes I only made it through that module because my peers helped me out alot. In turn I helped with the code (it wasn't all one-sided).

We never actually implemented a database in mySQL but it is the one I've had the most contact with, this comes from custom php applications i've written or just working with Wordpress.

Introducing MongoDB

When I first saw mongodb I thought it was awesome. I work with javascript quite often and so storing documents like JSON wwas easily the next logical step. I could dump into a document and then like magic I can access and query with the simplest of APIs.

I fiddled and I played over time and because I knew and was super comfotable with it. I decided it would be great for using with braid for storing everything. Now everything I am going to be writing is either javascript, or very javascript-esque. Brilliant!

Planning my database

I knew from my college days that I should plan my database, lucky I remeber enough to know that I need to try and eliminate as many many to many relationships as possible and that maybe I should try and encapsulate and normalize my database. I decided to plan it on paper first.

I struggled. I couldn't wrap my head round it (it's college all over again!). After a while I got some fresh eyes on it both my own and a person who is much better at this stuff than me.

I'd come up with a diagram on how it should be working together. I mapped it based on what I learned on college. Most of them were easily one to many relationships which is fine. However I ran into a problem when I had a many-to-many relationship between my 'Threads' and 'Modifiers'

Data relationships

Here is how it ended up looking, you can see that they all roughly had one-to-many relationships but not many-to-many relationships.

After talking it through, I learned it was quite simple and that each of the elements of the database should be their own collection, this maps to the same as their own table in a more traditional mySQL fashion.

Learning MongoDB's Concepts

The problem I was facing when trying translate these between the two philosophies was that MongoDB gets it's speed from being able to read flat documents.

After struggling and really thinking though, I decided to look on MongoDB's website only to find explanations of what I was struggling with. I wasn't sure how to model my data in a way so that it's easy to handle one to many or many to many relationships. Luckily it was easily explained through two concepts of embedded documents and referenced documents.

They explained clearly which direction I must take. Given how much data I'm going to be collecting from other services. I've decided that every element needs to become it's own collection and then store references to any other collections. This means when I grab the data, if I want to populate it with data from another collection then I need to perform another query to fetch that data.

The website also explained how to model data based on tree structures. So I decided that the following is how the database should look based on the things I learned from properly looking through MongoDB's website resources.

Database Tree Structure

You can see above how the relatonships between them work. If this was to be one big document then we would have all these nested documents inside of one another, of course also this isn't good because documents have a size limit before they become too large and need to be supported in a different format. By storing each documents in their own collection and keeping references to them we stop this. The downside is while we have to perform more queries, as a way of maintaining the database, this is a much better of doing it.

Wrapping up

Normalizations sucks and I didn't escape it. Next on my list is leading to the point that I actually need to implement this database into MongoDB and try it all out to see if it really works. From a logic point of view though. I think this is the right way to go.

I've already planned the schemas that I need to write. However, I also need a way of storing and creating models of schemas for each service I plan to support. Once I've made these service specific schemas along with these other generalised schemas. I'm hoping I will have a system flexible enough to allow me to quickly and easily add new services. This is partly due to the power of MongoDB, because it is schemaless by design. This means I can virtually enforce schemas but in reality MongoDB will handle most things I give it and store it in a document.