On my way to graph databases

For the past month, I’ve been evaluating my first graph database – Neo4j. In this post, I will write about how I got into this database and in the next one, I will tell you about my thoughts on it.Like all computer educated individuals, I was first acquainted with databases back in school and of course, it was all about RDBMSs. But since then there has been so many innovations in this area, mostly in the field of NoSQL. Me for one, whenever there’s been talk about NoSQL, I would think of Elasticsearch, Solr, MongoDB, etc. All of which are ACID incompatible. To be honest I had made an association between NoSQL and not being compliance with ACID. It was like this till I read about Neo4j, a graph database providing ACID features, while also categorized as NoSQL.

Like all computer science educated individuals, I was first acquainted with databases back in school and of course, it was all about RDBMSs. But since then there has been so many innovations in this area, mostly in the field of NoSQL. Me for one, whenever there’s been talk about NoSQL, I would think of Elasticsearch, Solr, MongoDB, etc. All of which are ACID incompatible. To be honest I had made an association between NoSQL and not being compliance with ACID. It was like this till I read about Neo4j, a graph database providing ACID features, while also categorized as NoSQL.

So what is a graph database? To understand something new, it’s always good to start from somewhere you are already familiar with. I think RDBMSs are a pretty well-known technology so let’s take it from there.

Relational databases are software providing data storage and extraction services designed for a large amount of data, at an acceptable speed. But these are not their key features, in fact, much simpler software like filesystems can provide the same functionality. In my belief, the key features of an RDBMS are their ACID (Atomicity, Consistency, Isolation, Durability) characteristics. In short, ACID features of an RDBMS makes sure that your data is stored as a whole and intact, or nothing is stored and it won’t be changed unless you ask it to. Of course, this concept is needs its own exploration. For starters, you can read about it on its Wikipedia’s page.

As I mentioned earlier, most NoSQL databases do not provide the same ACID characteristics. Meaning, they do not give you the guarantee that your data will be stored intact. Even though I’ve never experienced it myself, but I’ve heard of cases that you ask the database (which I’m not going to name) to store some data and there were some parts missing in the process. Of course, these are extreme cases and they mostly happen when you put a huge pressure on the database. But nevertheless, in lots of the cases, this is a deal breaker! At the same time, to be fair, NoSQLs provide characteristics which RDBMSs do not, like linear scability and really fast full-text search.

This used to be my experience on the topic and then, a month ago, I was reading some paper on Neo4j for the first time and I was captivated instantly. Don’t get me wrong, I’m not promoting the product or anything. I’ve hardly had the chance to try their software with real data. But yet the concept and the doors it opens for you are really interesting. Definitely, it’s no going to replace RDBMSs or anything but now that I know of such technology, I have the right tool for some of the needs that I was previously trying to satisfy with RDBMSs forcefully.

Comparing RDBMSs and graph databases

To compare the two concepts of RDBMSs and graph databases, first, you need to know that RDBMSs are based on Set theory while graph databases are, obviously, based on the Graph theory. These two are totally different concepts (not really! You can define a graph using a set of nodes, and a set of edges), while Sets are more of an abstract concept, graphs are visual. This one base difference is enough to know that graphs are easier to grasp for us. And Neo4j has done a good job presenting this visual:

Sample graph
Sample graph created by Neo4j’s web browser

But that is not all, even though the R in RDBMS stands for “relational” but the fact is relations are emulated in them using indices! In fact, from now on, I will refer to these relations as soft relations since they are implemented matching two keys (primary key and foreign key). But in graph databases, we actually have hard relations (or relations for short). In graph databases, a relation (in textbooks it’s called an edge) is implemented using a pointer to the other node, no index involved. This induces a huge performance improvement since no search is required to find the related data. That’s why it’s said that relations are a first class citizen in graph databases.

To take the both sides of the comparison fairly, RDBMSs are better in searching than graph databases (this is not a general claim). The tabular format of RDBMSs makes them more suitable to be searched and even though graph databases like Neo4j do support indices and searching but still there’s a gap here. In fact, RDBMSs are so good at searching that they do everything using it. Whenever you are selecting a record or matching a foreign key, you are actually searching. On the other hand, Neo4j uses searching only for finding the initial node and from there it uses relations (edges) to find the rest of the nodes directly. And as I mentioned before, using relations in graph databases are much faster than using indices. This means if you don’t know which record you are looking for exactly, an RDBMS can help you better but if you have the identifier of the node you are looking for, a graph database can help you find all the related data to that node with better performance.

Conclusion

Like always, different requirements need different tools. And when you find a new tool, you can hope that some of the requirements that previously has been satisfied with the wrong tools (out of desperation) now can be addressed properly. And this is what exactly happened to me by learning Neo4j. Now I know if the number relations in my data is considerable and I’ll be needing to find connected data by each query, which tool to use!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s