So after implementing the new architecture, I decided it is time to take the whole system out for a spin. I’ve already had chosen some dataset to accompany the system and now I want to test the system’s performance with them.
Considering the system architecture, there are four component that I can tune to get a better performance:
- PosgreSQL: there two ways to tune a PostgreSQL. First, you can play with the configurations like memory and number of concurrent connections. But you can also take it further and replace a single PostgreSQL with a clustered version of it.
- Change Monitors: as the architecture diagram shows, this component of the system has been implemented in a way to be scaled. But to be honest, my guts (some small test that I’ve done) tell me that this is not the bottleneck of the system as database caves in much sooner than Change Monitor does.
- Sync Engines: the whole reason for using RabbitMQ is that we can have a distributed queue with multiple consumers. The fact that I didn’t implement this component multi-threaded is because I intended to have multiple instances of it. The number of instances for this component is the primary candidate for tuning the system.
- Elasticsearch: ES can easily have arbitrary number of nodes. And I’m sure introducing new nodes to an ES cluster helps a lot. But for the reasons I’ll explain below, I decided to leave this one out this time.
For the start, I decided that I will only show the impact of number of Sync Engines because of the following reasons:
- It’s necessary to show the impact of each parameter individually: don’t get me wrong, tuning a combination of parameters is a thing of its own. But at first, we need to start with only one. I’ve chosen to tune the number of Sync Engines because I think it has the most impact on the whole system.
- The type of test: one can test this system from different aspects. By that I mean, you can test its read performance, its write performance or real world (a combination of read and write). Even though the impact of this system actually shows its true benefits when you are using it in real world scenarios, but I decided to test it in a write-only scenario. That’s because it is obvious the system’s performance will be hugely improved (thanks to Elasticsearch) if I include its read performance and the bottleneck of the system is actually when some changes are made to the data. In other words, this software is only worth it if it can withstand and caught up with a huge number of insertions. And of course, write performance tuning means finding the right number of Sync Engines.
Before I set the testbed and how many records we are going to use to test the system, I need to explain why the Sync Engine is implemented in a single thread manner. First of all, I’m sure there’s no need to explain that single-thread applications are much easier to implement than multi-thread ones. Of course, this reason alone is not enough to forget about threads, but here we can satisfy the same need with processes instead of threads. But why should we favor processes to threads? The answer to this question is mostly related to maintenance. You see, controlling the number of threads is something that you need to provide some API for while starting or stopping a process is something that is done by OS and does need to be controlled from within the application. But more importantly, it is because different processes can be scattered on different hardware while all the threads of an application should reside within the same hardware. This is basically the whole philosophy on how to make your software scalable.
The data used in this test had already been included in the source code and repository and it is borrowed from http://linux.dell.com/dvdstore/2011-12-01/. This dataset includes 10,000 Products in 16 Categories, 20,000 Customers and 12,000 Orders. Each Order belongs to a Customer and it also contains a couple of Products. Of course, some dataset alone is not enough to test the system at hand. We also need a couple of indices so once the data is inserted, we can measure how fast they will be indexed.
So I defined three indices:
- Products: a flat structure index of all the fields that can be found in a product.
- Customers: a flat structure index of all the fields that can be found in a customer.
- Orders: a tree structure of orders, containing one order, one customer and an arbitrary number of products, all of which are searchable.
The exact structure of each index can be found in the source code.
The dataset has already been inserted into some database of its own (in the same PostgreSQL instance) and the entity insertion is a matter of SELECTing the data from the source tables and INSERTing them into entity table (by calling the stored procedure of course). The process of entity insertion is done in five steps as follow:
- The three indices are created
- The categories are inserted
- The products are inserted
- The customers are inserted
- The orders are inserted
Only the last three insertions are actually monitored and measured. That’s because there are only 16 categories which are not enough to measure anything. To give you a better understanding of the whole process, here are the queries used to extract each of the mentioned entities:
SELECT p.prod_id, p.title, p.actor, p.price, p.special, c.revision_id, i.quan_in_stock, i.sales FROM products p INNER JOIN cat_rel c ON (p.category = c.id) INNER JOIN inventory i ON (p.prod_id = i.prod_id);
SELECT customerid, firstname, lastname, address1, address2, city, state, zip, country, region, email, phone, creditcardtype, creditcard, creditcardexpiration, username, password, age, income, gender FROM customers;
SELECT o.orderid, o.orderdate, o.netamount, o.tax, o.totalamount, l.quantity, l.orderdate, c.revision_id AS customer_revision_id, p.revision_id AS product_revision_id FROM orders o INNER JOIN orderlines l ON (o.orderid = l.orderid) INNER JOIN cus_rel c ON (o.customerid = c.id) INNER JOIN prd_rel p ON (l.prod_id = p.id) ORDER BY o.orderid, l.orderlineid;
For the last three indices, I ran the software (using Docker of course) with six different number of Sync Engines, from 1 to 6. For each test, I setup the system by specifying the number of Sync Engines, then signal a piece of code (written in PHP) to start the process. It will first send an HTTP request to the frontend, asking to create the indices and then asking to insert the Categories, and finally, the entities that we are trying to measure. For each of the three entity types we are trying to measure their insertion performance, once one of them is requested to be inserted (and I measure the performance for each entity type one at a time), the PHP script will monitor the queue in RabbitMQ in order to find when the job is started and when it is done. Monitoring RabbitMQ is not a precise measurement, it has a tolerance of seconds. But considering the fact that the whole process takes minutes, the error margin imposed by this faulty measurement is like 2% (worth case scenario) and since I didn’t find any better way to measure it, that’s the one I went for.
Once each configuration is run and measured, I’ll store the performance results (duh!) and remove everything. Then I’d move on to the next config and re-run the whole thing from scratch. Since this process’ performance is not deterministic and there are a whole number of unpredictable factors affecting it, I ran each configuration 10 times and used their averages and standard deviations in the charts that will follow.
The performance results
Now it’s time to present to you the results. First, let’s see the most important one, how fast the entities are indexed:
In the above chart, different configurations are on the x-axis and y-axis shows how many entities were inserted before one has been indexed. In other words, it shows how much slower the indexing process was compared to the insertion. For example, having 1 Change Monitor and 1 Sync Engine, almost 12 Customers were inserted before one has been indexed. This is definitely not much of an interesting performance. But the good news is that by introducing new Sync Engines, this performance is improved, of course in a non-linear way (which was expected from the beginning). Having 6 Sync Engines, the same mentioned performance is improved to the point where indexing is 3 times slower than insertion. It’s in the case that for Orders, having 6 Sync Engines means that orders are indexed almost instantly which is amazing.
Before getting to the result analysis, here’s the insertion rate for each entity type:
As you can see the insertion of Customers is much more tense than the other two. That’s because its SELECT query is composed of only one table while others are using joined tables. Also, Customers are the only absolute flat entity type among the three, the other two have collections which means when they are being inserted as entities, there are actually more than one record to insert.
As for the analysis of the charts given here, one can speculate that the reason why Customers are indexed the worst is because of their insertion rate. Reading through the code and the structures in it, you can see that Customers and Products are almost the same and there’s no significant difference between them to explain their different performance results. The only difference is the fact that Customers are generated faster and it puts the whole system under a load almost three times greater than the other two which is also the rate of how much faster the records were inserted. Coincidence? I think not!
At the end, this chart shows how many entities were indexed per second for each type of entity. As you can see, Order has the lowest number of entities indexed per second. That’s because Orders have the most complex index structure of all, making them hard to extract and harder to index. Customers, on the other hand, have the simplest structure which as the result, they are indexed the fastest.
Well as for the conclusion, the results show that this system can still use some tuning. But even though the results are promising and the system can have practical uses as is. The number one candidate for tuning is and always will be, the PostgreSQL itself. It was always obvious to me that PostgreSQL is the bottleneck in this system. As for my next steps, I will try to see what and how I can tune in PostgreSQL.