From tabular to a tree, my way of ORM for a PHP code – Part three

Of course when you reject someone’s opinion, you have to provide them with yours so they can judge on their own. And that’s what I’m going to do here, since I rejected most of the ORM libraries out there, in my previous post, I’m going to present you with my way of an ORM in this one.

Appropriate features for an ORM
(of course in my point of view)

The followings are the list of features I look for in an ORM:

  • Returning result set in tree format instead of tabular
  • Query management
  • Connection management
  • Transaction management
  • Event propagation

Or at least these are the features of the library that I wrote :). I will dive deep into the first feature in this article and leave the second one for the next post and then you can decide on your own if you find them handy or not. The last tree I leave for you to explore on your own, or maybe I’ll come back to them some time in the future.

For starters, the process of mapping among database records and in-memory objects (and the other way around) is a job all ORMs have to do. Yet, the ways it can be done is the first point I’m going to make.

We all know that in order to get data out of an RDBMS, one needs to write some SQL statement to provide the engine with the instructions needed to select and retrieve records and fields (the same is true for saving data in). The other fact about SQL select queries is that their results are always a set of compatible records. When I say “compatible” I mean that all the records in a result set hold the same list of columns, with the names, types and order. In other words the result of a query is always in form of a table, with predefined number of columns while number of records in it depends on the data available in database and the criteria specified by the query itself. To keep it visual, here’s a result set:

tabular_result_set

The common case in ORM mapping process is that each record is translated into an object and vise versa. While this is absolutely fine for simple case, there are so many times that it just won’t do! And it has everything to do with R in RDBMS. One the most important phases of designing a database is normalization. In this phase you are trying to design your tables in a way that there’s no redundancy in data stored and also data reconstruction (select queries) is done with minimum cost. This phase results in relations being born and RDBMSs are all about relations.

Being able to make my point, I have to use an example and here it goes:

Consider an e-commerce website selling all sorts of goods (Amazon if you like). Once a purchase is completed, an invoice is issued including all the products bought. Let’s normalize this hypothetical example. I won’t waste your time and just write down the answer; I would find the following entities (tables):

  1. Product: The table to hold each product’s information, e.g. an iPhone or a Surface Pro.
  2. Item: It’s like instantiating a product, in other words, it’s a serial number.
  3. Invoice: An entity that shows some user has bought a list of products and once it is final, it will also include each item (serial number) for each product in it.

And in order to save an invoice in the database, I’m going to design the following ERD (Entity–relationship model):

invoice_erd

 

Trying to keep it simple, I only introduced a couple of fields for each entity, yet as you can see things can get pretty complicated (try to generate that with a library)! Any who, using the above ERD I can have products, items and invoices. But most importantly, I can also store the products within each invoice, “Invoice Product”, and also once the serial number of the purchased products is fixed, I can make the relations between “Invoice Product” and “Item” showing that the item is assigned to a product within an invoice.

Now that my example is set, let’s try to retrieve an invoice out of the database.

The only thing you need in order to retrieve an invoice, is its id. Having an invoice_id you can retrieve all that there is for an invoice, including its products and their items. One way to do this is to retrieve each entity type, one at a time:

<?php
$stmtInv = $dbh->prepare("SELECT * FROM Invoice WHERE invoice_id = $invoice_id");
$stmtInv->execute();
$invoice = $stmtInv->fetch(PDO::FETCH_ASSOC);

$stmtPrd = $dbh->prepare(
    "SELECT p.* FROM `Invoice Product` ip"
    . "INNER JOIN `Product` p ON (ip.product_id = p.product_id)"
    . "WHERE ip.invoice_id = $invoice_id"
);
$stmtPrd->execute();

$invoice['products'] = array();
while ($product = $stmtPrd->fetch(PDO::FETCH_ASSOC)) {
    $stmtItm = $dbh->prepare(
        "SELECT i.* FROM `Invoice Product` ip"
        . "INNER JOIN `Invoice Item` ii ON (ip.invoice_product_id = ii.invoice_product_id)"
        . "INNER JOIN `Item` i ON (ii.item_id = i.item_id)"
        . "WHERE ip.invoice_id = $invoice_id AND i.product_id = {$product['product_id']}"
    );
    $stmtItm->execute();
    $product['items'] = array();
    while ($item = $stmtPrd->fetch(PDO::FETCH_ASSOC)) {
        $product['items'][] = $item;
    }
    $stmtItm = null;
    $invoice['products'][] = $product;
}
$stmtPrd = null;
$stmtInv = null;

print_r($invoice);

In the above code, for each of the entity types, invoice, product and item, there’s a separate SQL statement written. In other words, each record returned by any of the queries executed, will be mapped into a single object. This makes ORM’s job much easier and your life (as the ORM’s user) much harder.

I don’t think anyone would object me when I say; it’s a pretty complicated piece of code for such a simple task, specially since you have to write it over and over each time you want to work with some another kind of entity! Yet if there are those who like it this way, I’m sure no one can argue that this is not the best way to implement this need, performance-wise! The number of queries executed this way depends on the number of products bought in the invoice. For sure there’s no way around that, I mean we don’t want to confine our users with the maximum number of products they can buy. At the same time taking out a single invoice could lead to execution of hundred queries. And it’s just for one single invoice, just consider if we need the same kind of information for a bunch of them!

So what can we do to improve the performance?

The performance problem mentioned in the previous paragraph was mostly due to the multiple requests sent from programming language to the database server. One way to overcome this problem is to move the code to the database server and just return the result to the PHP. As doable as this solution is, there are a couple of issues with it; first of all writing a customized code for each different case that arises, specially in PL/SQL or any other database programming language is no easier than it was in PHP, let alone debugging it. Secondly, even if you are up to the challenge to move the code to the database, most of the RDBMSs can only return data in tabular format. But as it is obvious, the structure we have here is a tree, incompatible with tabular format. Right now I’m working on a solution to make this happen, but I don’t think it’s possible for all the RDBMSs out there but some.

Once you decided to leave the code at application layer, the only think you can do to improve the performance is to join all the queries into one, hence reducing the number of SQL executed, and take it from there. Of course there are problems with this approach as well:

  1. The result is still in tabular format and you need to make a tree out of it
  2. Sometimes writing the joined queries tend to be hard
  3. The number of records returned is the Cartesian product of the results for the queries joined which makes the result set go huge

From the problems mentioned above, the first one can be solved using Pomegranate library but the other two will still remain at large. Among the three, the third one is the most serious one. If the number of records in each result set is as high as 50 and there are 4 queries to merge into one, you’ll end up with a result set of 6,250,000 size! Yet personally I’ve found that it’s hardly the case, specially when it comes to websites (compared to enterprise applications) and once you use it you will come to appreciate how much it makes your life (as a developer) easier. The way you can actually return a tree out of a tabular result set in Pomegranate is as simple as writing a query and grouping the columns into objects which I’m going to show you in the next post “Query management in Pomegranate“.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s