From tabular to a tree, my way of ORM for a PHP code – Part four

As I mentioned in my previous post, one of the features I think is appropriate for an ORM is query management. And by that I mean the ORM should give a hand, organizing your queries.

Query management in Pomegranate framework

One of the greatest mistakes one developer can make is keeping codes of different languages in one file, like having PHP in HTML or HTML in PHP, it makes life much harder and mistakes much more common. The same rule applies for embedding SQL within programming languages like PHP. I know it’s so tempting to generate SQL on the fly using PHP but once you see that there’s no need for that you’ll regret all the times when you did so.

When I talk about query management in ORM, I mean ORM should give you libraries by which you can separate SQL from your code while keeping it organized and accessible. Pomegranate does this by incorporating XML files in which your SQL statements will reside and you can address them using a name given to each SQL statement. This way your queries will not be written in the PHP files and so much more that we are going to see in couple of seconds, first let’s see an example of such an XML file:

<?xml version="1.0" encoding="UTF-8"?>
<DataMapper>
  <Query name="getUserList">
    <SQL>
      SELECT
        u.user_id, u.username
      FROM user u
      WHERE
        u.username LIKE '%#search_phrase:string!#%'
      ORDER BY #sortColumn:field_name# #sortDirection:sql#
      <Include if="partially">
        LIMIT #from:numeric!#, #count:numeric!#
      </Include>
    </SQL>
  </Query>
</DataMapper>

Right here, right now I admit that I’m a indentation frantic. I can not read a code unless it is properly indented. That’s all my work is to promote writing code with indentation. In the above example, one of the readability benefits of separating code from SQL, is that your final query is intact and much easier to investigate. Opposed to generating it on the fly by code. This way you can debug your SQL statements much easier.

Anyways, back to the given example XML. As you can see I’ve written an SQL and gave it a name of “getUserList” which later on in the code I can use it to load the statement in the code. Next is passing parameters to an SQL, seldom you might find a statement in which you won’t be trying to inject some parameter to. In most of the ORM’s you’ll tag your parameters’ position in your statement and once you want to actually run them in the code, you have to provide the library with their replacing values. Here I’ve dove the same with one minor difference. You see, different parameter types should be treated differently when they are going to be injected into a statement. Numeric values and strings an example of this, usually when a numeric value is inserted into an SQL statement, it is only escaped and then placed where it is tagged while for a string it should be quoted. Or booleans for instance must not be quoted at all or they all going to considered as true.

In the above sample XML file, the pattern #name:type# is used to tag a position within a statement where it needs to be replaced and in what type of replacement. In most of the libraries I worked with, specifying the type of parameter for a SQL is done within code, but personally I find that impractical. Just think about it, if you are writing code in PHP and you want to run an SQL in order to retrieve something out of database, who knows best the type of parameters that are going to be replaced? The PHP developer or the one who’s in charge of writing SQL statements? In my point of view it’s the later one! I admit that in most of the cases these two are done by the same developer but even then what I have personally experienced is that when I’m writing the PHP code and I want to bind the parameter values to the statement executed, I have to use the database schema or some other documentation, matching the parameter placement with the value it’s going to be evaluated with. Now compare this with the time that you are writing the SQL in the first place. At that time you, or anyone who’s up to the task, must have the mentioned documentations ready for his own job anyway. Thus the job of specifying parameters’ type comes much easier to him, do you think? That’s why I believe the best place to specify parameters’ type is in the SQL statement itself.

Anyways let’s get to the good parts, how to make a tree out of a query result. Consider the example given in the previous post, with invoices, products and items. First let’s show what we expect to get as the data:

{
  "invoice_id": 1,
  "code": "INV_1",
  "user": "Mehran",
  "products": [
    {
      "product_id": 1,
      "name": "CPU",
      "price": 399.99,
      "items": [
        {
          "item_id": 1,
          "serial_number": "i7-0001"
        }
      ]
    }, {
      "product_id": 2,
      "name": "RAM",
      "price": 59.95,
      "items": [
        {
          "item_id": 2,
          "serial_number": "2GB-0001"
        }, {
          "item_id": 3,
          "serial_number": "2GB-0002"
        }
      ]
    }
  ]
}

I wrote the desired output in JSON because I think that’s easy to read. As you can see I’ve defined an invoice with two products sold, a CPU and two RAMs. The serial number for CPU is i7-0001 and for RAMs are 2GB-0001 and 2GB-0002. As we’ve seen in the previous post it takes 4 queries to take all the needed data out of the database; one for invoice, one for products and two for items. Now I’ll be getting all them out of the database with just one query! First let’s write down the query to retrieve all the needed fields and data:

SELECT
  i.invoice_id, i.code, i.user,
  p.product_id, p.name, p.price,
  it.item_id, it.serial_number
FROM `Invoice` i
INNER JOIN `Invoice Product` ip ON (i.invoice_id = ip.invoice_id)
INNER JOIN `Product` p ON (ip.product_id = p.product_id)
INNER JOIN `Invoice Item` ii ON (ip.invoice_product_id = ii.invoice_product_id)
INNER JOIN `Item` it ON (ii.item_id = it.item_id)
WHERE
  invoice_id = #invoice_id:numeric#

This is no magic, all it does is that it combines all the given queries in the previous post and gives out what we asked for. There’s just one problem with it, let’s see what it will actually output considering the given example:

+------------------------------------------------------------------------------------+
| invoice_id | code  | user   | product_id | name | price  | item_id | serial_number |
+------------------------------------------------------------------------------------+
| 1          | INV_1 | Mehran | 1          | CPU  | 399.99 | 1       | i7-0001       |
+------------------------------------------------------------------------------------+
| 1          | INV_1 | Mehran | 2          | RAM  | 59.95  | 2       | 2GB-0001      |
+------------------------------------------------------------------------------------+
| 1          | INV_1 | Mehran | 2          | RAM  | 59.95  | 3       | 2GB-0002      |
+------------------------------------------------------------------------------------+

It is worth it to give it some time and analyze what is happened here. First of all there are three records returned, then with each record the invoice info is repeated. The same goes for RAM in second and third row. I suppose it is obvious what is happened here, isn’t it? Perhaps it’s much easier explained if you read it backwards, from items to products and finally the invoice. Each item is joined with its product and each product is joined with its invoice. And since two of the items share the same product and all three of them share the same invoice, well there’s no escaping it but to have them repeated!

This is where Pomegranate framework comes in handy. It will eliminate the redundant data and turn the output into a tree, just like we hoped for at first. But in order to do so, you need to tell Pomegranate which fields should be grouped together to form an object. And you can do so in an XML file holding the query:

<?xml version="1.0" encoding="UTF-8"?>
<DataMapper>
  <Query name="getUserList">
    <SQL>
SELECT
  i.invoice_id, i.code, i.user,
  p.product_id, p.name, p.price,
  it.item_id, it.serial_number
FROM `Invoice` i
INNER JOIN `Invoice Product` ip ON (i.invoice_id = ip.invoice_id)
INNER JOIN `Product` p ON (ip.product_id = p.product_id)
INNER JOIN `Invoice Item` ii ON (ip.invoice_product_id = ii.invoice_product_id)
INNER JOIN `Item` it ON (ii.item_id = it.item_id)
WHERE
  invoice_id = #invoice_id:numeric#
    </SQL>
    <ColumnGroup name="" size="3" />
    <ColumnGroup name="products" size="3" />
    <ColumnGroup name="products/items" size="2" />
  </Query>
</DataMapper>

This time focus on the added ColumnGroup tags at the end of Query tag. These tags instruct Pomegranate ORM engine to form a tree out of the query’s result by grouping the first three, then next three and finally the last two columns / fields. Having these three tags, ORM can generate the JSON data we asked for, now that’s magic!

Just consider how much code is saved using this technique. You’ll write a rather straight forward query and divide its fields into segments and the rest is history. All you need to do is to give its name to some method of ORM accompanied with the replacement parameter values and you’ll get what exactly you will need, nothing more and nothing less. Just to point out the complexity of what is done here, I wold like to draw your attention to the fact that the last ColumnGroup introduces a second level tree node. Items are children of products and each product has its own items. This pattern can go down for arbitrary number of levels and it can even fall back to the root of the tree and branches new nodes. In other words, as long as you can define your tree, this ORM will produce it for you. Yet if you need the old tabular format in any case all you need to do is to name no ColumnGroup tag for the Query and it will just return the result set untouched.

I’ve been using this code for a long time and it’s at some pretty stable state right now. Perhaps you want to give it a test drive to find its benefits first hand. But don’t forget the downfalls of it as I mentioned earlier, the result set can tend to get huge if your tree is too deep, too spread or too many child leaves in it. But for ordinary cases where you’ll need a moderate amount of data out of the database, this technique will save you and your CPU’s time a considerable amount.

And that’s all I look for in an ORM. I don’t see any benefit in generating codes or any of the sort, such features tie your hands and give you nothing instead. Well anyhow, that’s all I’ve got on the matter. Sure there are a lot I didn’t actually tell you about Pomegranate framework, perhaps you can use it and find them on your own or maybe I’ve dig into them someday. Thanks for reading.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s