After months of going back and forth, I made the decision to change the architecture of Acidbase because of the following technical issues:
- Making sure no entity goes un-indexed: as I will get to, this was the number one reason why I revised Acidbase’s architecture. Having no guarantee that the changes of your database will definitely be reflected on your indices is a deal breaker. Even though there are ways to ensure it in the current architecture, but it made the whole design so complex.
- Too many components involved, making the architecture too complex: the number of components in a system has a direct correlation with the complexity of that system. Complex systems are expensive to maintain. You have to look out for each component’s update and make sure that the rest of your system will comply with that. In general, simple is always better.
- Fragile and hard to debug: when you have too many components, it means there are so many things that can go wrong. As no piece of software is without bugs, you need to watch out for every single one and worst is that it is much harder to pinpoint any problem that you face. Each time you have no idea where a new bug resides, you have no choice but to consider every single component as a candidate.
In my previous architecture, the index engine was informed about the changes of the database through a RabbitMQ plugin (called PostgreSQL LISTEN Exchange) from within the PosgreSQL directly. This was an ideal design because:
- Event-driven is absolutely better than polling: even though events are implemented by polling in underlying layers, but it’s totally different from implementing the polling in upper levels. Polling is only acceptable if there’s no other way and it needs careful implementation too.
- Easy to implement: the current architecture was easy to implement, that is if you can live with the problems it had. All you need to do was to configure and connect the different components it had and you are good to go.
But at the same time it had the following problems:
- If some event was lost (for any reason), there is no plan B: there’s no contingency plan for the time the other components are down. Being an architect, you need to consider every single scenario there is, including the ones that should not happen. Even though the whole architecture was design around RabbitMQ, and without it the system will definitely fail, but I needed to make that once the system is back up, it will be able to carry on. But in the current architecture, if some event was lost, there is no way to recover it.
- Controlling the number of database connections: even though it’s most likely not an issue as I project it, but one of the resources that I came to care for is database connections. This resource is absolutely scarce and I realized I need to make sure it is used in a controlled manner. Having too many components means that they need to talk to each other by making connections and you might end up wasting a valuable resource.
- Too many hacks: the two problems I just mentioned are not a deal breaker on their own. As an architect, it’s my job to address them and come up with feasible solutions for them. As I took my time doing so, I found myself in a web of components with lots of dependencies and so many hacks which made me realize that I took the wrong turn somewhere.
Anyways, after having a discussion on SO with a fellow software engineer, I decided to change the design and here it is:
The notable changes in this new architecture, compared to the previous one, is that once a change is made in database affecting any index, it’s simply stored in the database and some external application is watching for such records implementing a polling mechanism. This design imposes a load on the database as some process is constantly asking it whether any new records have been added. But at the same time, there’s no chance it can go wrong since the change indicators are stored within the same transaction as that the actual data is stored in. This means that either the data and its indicator are stored both or none of them are which is all I wanted.
In the above diagram, the connections labeled “1” are the polling mechanism that I just talked about and the connections in blue labeled by “2” are RabbitMQ. In other words, once a “Change Monitor” discovers a new record to be indexed, it informs one of the “Sync Engines” through RabbitMQ and the rest is exactly the same as in the previous architecture explained before.