Database design for enterprise software solutions

Not that database design is any different for enterprise software from other types of software solutions or anything, but a couple of requirements shared by most of them made me think of a design that I’m going to share with you here. The design introduced here is a really powerful one that can open so many doors for you but at the same time, I admit, it can be troublesome in some of the cases, but at least you’ll have a get away route in cases where you are stuck and have got no choice. We’ll see that in couple of minutes.

Requirements addressed

First let’s start with what requirements are addressed here, in fact there are only two! 1) Keeping a history of data (previous revisions) including who and when has made changes into them. And 2) elastic relations between different entities in the database which means relating entities which we didn’t see at design time that they might have a relation someday. Perhaps these two requirements are not that important to you but personally I have came to realize that in almost every enterprise solutions these two requirements exist, less or more. Next I’m going to give you examples on each of them to make sure they are clear enough.

1. History of data

Putting aside the need for going back and having access to previous revisions of an entity, there is a very important need that dictates us to take this need under consideration. Going back to invoice example I’ve been using a couple of times now, an invoice consists of a couple of products sold. Attributes of a product in an invoice, like its price, once used; form a contract between seller and buyer and should not be changed over the time. But at the same time we will eventually face the need to update the price of a product sometime in the future! These two contracting needs are what indicates the need to keep history for our data. Of course one of the ways you can solve this problem is to copy the product info that are a part of the contract in a table other than product and associate it with the invoice. It’s just that this solution, as practical as it is, requires you to design an intermediate table and number of such tables could grow dramatically since for each thinkable relation between your entity tables, you might need to keep a copy of your contracts.

2. Elastic relations

Your data will consist of entities and their relations for sure. A software solution is like a living being, it will evolve over the time. No matter how good you analyze the requirements, for sure you’ll face the time when you have to change what is implemented. And that’s because requirements change! As we can not eliminate the need to go back to our analysis, design and implementation phases, but a good design will help you reduce the time you’ll spend applying new requirements. The change in requirements, from data point of view, can be in one of two cases:

  • Changes in entities; adding a new one or changing an old one.
  • Changes in entity relations; relating entities that were not related before.

The first requirement is not addressed here, we will talk about the second one only. To understand this need, consider the case where you are asked to make a relation between invoices and emails as what email sent which invoice. Such a requirement could be easily missed when you first analyze the project and making changes to an already implemented one is such a hassle. Not mentioning that emails could be associated with all kinds of entities within your system! That’s when elastic relations can come in handy.

Solution provided

Now that we understand what we are dealing with, let see what we are going to do about it. Let me give you an ERD of the final solution and then we will take it from there:

entity-revision ERD

In the depicted ERD above, there are three tables. entity will hold data for each entity in the system, regardless of their type. That’s why it has a field for type, to hold the type of entity. Then we have a child table of entity_revision which has a record for each revision made to an entity. entity_revision has a “one to many” relation with entity indicating the entity for each revision. This table and / or entity do not hold the actual data for entities / revisions. They just keep information that is shared among all entities in the system. Perhaps giving a class diagram showing the same structure as above will help clarifying the situation:

entity-revision Class Diagram

 

The class diagram tries to show the same concept as the ERD in OOP world. It has also added two imaginary entities of Invoice and Product to show how the introduced concepts are used. As you can see, each entity within your system will inherit from Revision and as the result will have Creator (a user) and CreateDate (date of creation). It will also be a part of an entity and through Relation class, it can be in a relation (of arbitrary type) with any other Revision. This means to associate products to an invoice, you don’t need to add a Products property to your Invoice, they are already implicitly linked, hence the elastic relationship. Of course this is possible as long as your class is inherited from Revision. Having a Type in Relation class gives the opportunity to specify a type for your relations. For instance, you might link products to your invoice in two ways. First products that are sold in the invoice, and also you might want to list products that are returned in the same invoice (if your shop has a return policy).

I’m not gonna dig deeper into this design anymore and I’m gonna leave the rest to you. Instead I’m gonna spend some time explaining how the State of Revision fits into the picture which is an important part of this design.

Using the Entity-Revision design

Even though it might look obvious how you should be using the given tables / classes (which I believe are interchangeable), there are important notes here to mention. And it all related to Revision’s State property. This property / field holds the state of a revision and it can have one the following values:

  1. Active: For the revisions that are the active state of an entity. Each entity must have one and only one active revision (this statement will be completed at the third line).
  2. Obsolete: When there’s a new active revision introduce for an entity, the old active revision will be marked obsolete.
  3. Deleted: Removing an entity will result in tagging it deleted. This way you will have the feature of reviving a deleted entity and also not losing the data associated else where. I mean you don’t want your invoices changed if you delete a product! Each entity must have one and only one active / deleted revision, period.
  4. Used to be deleted: Once some deleted revision is revived, the deleted revisions will be changed into “used to be deleted” indicating the revisions that are revived. Of course each “used to be deleted” revision will be followed by an active / obsolete revision.

These are all the states I could think of for a revision. Now let’s take a look at some of the scenarios for adding, updating and removing entities. For the sake of example, think that we are talking about an entity of type invoice. Here are the steps to do each of the scenarios:

Adding a new entity

  1. Insert an entity of type “invoice” and get back `entity_id`
  2. Insert a record in `invoice` table with the same `entity_id` (getting back `invoice_id`)
  3. Insert a record into `entity_revision` with the `entity_id` and `invoice_id` (renamed as `revision_id`), here the state is Active

Updating an existing entity

  1. Since we are updating an entity, we should already have an `entity_id`. Using this id we will insert a record in `invoice` table, getting back `invoice_id`.
  2. Then update the Active revision state’s into Obsolete
  3. Insert a new revision with `entity_id`, `invoice_id` and state Active

Deleting an entity

  1. Duplicate the Active invoice, generating a new `invoice_id`. This is necessary because we don’t want to lose data or log
  2. Then update the Active revision state’s into Obsolete
  3. Insert a new revision with `entity_id`, `invoice_id` and state Active

Reviving a deleted entity

  1. Duplicate the Active invoice, generating a new `invoice_id`. This is necessary because we don’t want to lose data or log
  2. Then update the Active revision state’s into Used to be deleted
  3. Insert a new revision with `entity_id`, `invoice_id` and state Active

What do you have now?

Now you can walk through all the changes ever made into your entities, knowing who and when has made them. You can even delete an entity and keep all its history, without damaging associations it has with other entities. You can even revive a deleted entity without going back on who had deleted it in the first place. You can always get the latest revision of an entity using the state of its revision set to Active. But these are not all!

Consider the customer of an invoice is also an entity, with phone and address and everything. Sure the information of a customer may change over the time. But when you want to print an invoice, you will want it to look the same no matter when you are printing it. For instance if after two years you want to go back and print an invoice which it’s buy’s address has changed, you would need the print to show you the old address. That’s why the revision_relation relates two revisions, so you can have which revision of an invoice relates to which revision of a customer. But other times you might need to show the latest revision of an entity and not the revision that is a part of relation.

Consider products and invoices, an invoice might change over the time and if we want to list invoices in which some particular product is sold, you need to show the latest revision of such invoices. That’s when you will find all the invoices related to that product but then you have use the selected invoice info to find the latest revision of it. And the fun part is that these all can be done in a single query! For sure the number of joins and query lines will grow but still it’s all doable in one single query.

Now if a new requirement asks for a new type of relationship between two entities, you don’t need to change your data structure at all. All you need to do is to change your code and interface. As I mentioned at the beginning of the post, there are downside to this design as well.

Firstly, the two table of entity and entity_revision tend to grow huge over the time. Each add or edit that you might do inserts a new record in one or both of these tables. Even though this is intended but the size of these two tables can be an issue over the time. One rather easy solution to this problem is using horizontal partitioning which is supported in MySQL for a couple of years now. This way you can partition your entity_revision table based on state field and actually this boosts your queries performance a lot. Most of the times you’ll be searching for revisions in Active state which then all of them will reside in a separate partition. And usually the number of Active revisions are much more than Obsoletes which leads to a smaller partition to search in.

Secondly, you might need to write some complex queries. Personally I like a little challenge now and then (I’m sure you are not surprised!). At the same time I admit that there were times for myself that the query had got too complex to be worth it. That’s why I started to work on another approach to give me the same features but without the problems with this one. Don’t worry that’s still under development and I’ll get to that in some another post.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s