Oct 202011
 

With the recent massive flood of frameworks, libraries and toolkits on the market these days it is easy to forget that underneath it all is the good old, plain and simple, PHP with all its kinks, quirks, and huuge set of builtin functionality.

PHP has vast amount of extensions which solve all sort of problems. And if PHP doesn’t have it built-in, we have an impressive amount of additional extensions both on pecl and now recently more and more on github.
There is a high chance that someone else has been in your shoes already and solved the problem, so it is worth looking around over the horizon and see if the problem has been solved already.

For some reason the current practice seems to be the “RoR” idiocy where “RoR developers” barely even know that there is this Ruby some miles down the stack. PHP has hit this “stepping stone” already with WordPress, Drupal and even Symfony and that is a weird and scary thought. Remembering “where you came from” is an important fact to remember, even for those who specialize in specific products. Looking at how other projects work, comparing notes, work ethics, features and functionality is also very important. Getting different perspective and knowledge is how we can improve our solutions and work more efficiently. If your specific product doesn’t have native support for something, why not look at a different framework/library/cms/toolkit/.. even PHP extensions?

As June mentioned earlier, going ‘back home’ and checkout the PHP manual pages is generally a good idea. Things change, manual pages are updated, improved, added, and you have different perspective, other problems to solve and so on. Even though you believe you know all the basics, you still need to practice them, and that includes browsing the manual from time to time, again and again – no matter which project it is.

So what is the best way to stay in touch? Kept up2date with new ways and offerings? New solutions to the same problem? Get involved!

By far the best way is to get involved with the project you are using. Even just silently idling on the mailinglists and read the subjects. Subscribing to the commit lists is a fantastic way to see precisly what is going on and see which direction the project is taking. Who knows, after a while you may spot something the others didn’t. Get an idea for a killer feature. Shed a light on different perspective the others didn’t think of. After a while hanging on the lists you’ll get a feeling for how the project works, and hopefully start chiming in. Give your 2cents, and who knows – even cook up a patch or two.

 Posted by at 4:37 pm
Oct 072011
 

Introduction

Whilst everyone is buzzing and creating fancy new Symfony2/Doctrine 2 applications, and perhaps even a shift to new frameworks/no frameworks, a great deal of us are still maintaining legacy apps and will be for some time to come. As these apps grow, we occasionally need to look back and scream at our old code and wonder why we didn’t make it more scalable or use neat optimisation tricks back when it was first conceived. The fact is, many of these “tricks” are not necessary at the time for a virgin app, and we need to develop code that is relevant to the task at hand.

That said, being aware of some of the case studies I will present to you now may help you to optimise old code, but may also allow you to think twice when you are working with new code – as the things I will describe do not take too much time to implement first time round.

Case study: Working with large resultsets and array_merge

In my sample application, the database contained many different types of organisation stored in different tables, and due to the structure of the data and the criteria for each organisation, there was no easy way to retrieve several organisations at once using pure SQL and joins (well, no convenient way). The solution is relatively simple, one query per organisation then combine the results, in this case as we go along:

$membershipClasses = array("Entity1", "Entity2", "Entity3", "Entity4", "Entity5");
    $results = array();
    foreach ($membershipClasses as $joinedTable)
    {
      $query = $this->createQuery()->from("TablePrefix".$joinedTable." t");
      // Lots of differing criteria based on which class we were dealing with
      ...

      $results = array_merge($query->execute(array(), $hydration), $results);
    }

This logic was used to generate a report with approximately 50,000 rows, used 1.2GB of internal memory and took around 20 minutes to complete – often failing or bringing the system to a halt.

One optimisation used in this case (many more are surely possible but lets focus on one) involves the last line – using PHP’s array_merge function. Simply switching this out to use a simple array traversal brought the execution time down to under 2 minutes, however there was no change in memory usage.


$theseResults = $query->execute(array(), $hydration);
foreach ($theseResults as $aResult)
{
  $results[] = $aResult;
}

Why is this so much faster? And why can’t we use += instead?

 

The “problem” with array_merge, is that even though it discards numeric keys, it still has to *check* them all first, which is a lot of overhead on 50,000 rows. We can’t use += in this case either, because that would respect the numeric keys which would start from 0 each time, and therefore the results would not “stack” as intended, unless we tell Doctrine to index the results with some unique key.

Why it takes *so* much longer is a bit of a mystery to me. I’ve looked at the c code behind it (ext/standard/array.c) and it’s, well quite frankly voodoo.

Case study: Limiting results

Something that was “fixed” (read: removed) in Doctrine 2 was the ability to limit results returned from queries, at least the “magic” part of it. The problem with limiting hydrated results is that you need to know exactly how many rows each sub-tree will contain before you can limit the query as a whole. Imagine a person with many addresses, and you want an array limited to 5 people – you can’t list add “LIMIT 5″ to the end of the query, because when this is hydrated you will most likely end up with one or 2 people, the second of whom may not have all their addresses, because you’ve told your database manager to return 5 *rows*, it has no idea how these rows relate to your model.

In Doctrine 1, adding a limit clause would cause all sorts of magic to happen, the resulting query ending up as a collection of complex subqueries each with their own limit clause, the more joins you introduced, and the more levels of join, the more complicated it becomes. First level joining is not so bad, but joining several levels deep soon starts to get heavy, and your execution speed will suffer for it. Couple this with a large resultset and you will see the smoke drifting from the server room in no time.

So, what’s the solution? Well, this one is not so clear cut – you have to experiment. Sometimes, doing it the “magic” way will work just fine, and is totally acceptable, but when the query becomes “heavy” you have a few options:

Work out your subset first

This is the default Doctrine 2 approach, it’s quite simple – you use one query to get all the ids of the subset of top level elements, then pass these IDs to another query in a “WHERE IN” clause. When you look at the resulting SQL, it can be quite scary, especially if you are dealing with many results – you can easily be saying “WHERE IN (1,2,3,4…..9999999999)”. Most dbms will handle this surprisingly well, as long as you are using the primary keys, so experiment with it and it might be the way to go.


if ($limit)
{
  $subQuery = $this->createQuery("foo")
                   ->select("foo.id")
                   -> //
                   ->limit(50);
  $ids      = $subQuery->execute(array(), DOCTRINE_CORE::HYDRATE_SINGLE_SCALAR);
  $query->andWhereIn("cf.id", $ids);
}

Note the use of “single scalar hydration” – we have no need for more than this (to be really picky could go straight to PDO here but for consistency this is ok). In this case, the single scalar hydration will give us a resulting array of raw IDs, which is exactly what we need to pass to the main query.

Make your own subquery

A variation on the above is to embed the “ID sucking” query into your main query, some database engines may prefer this style, but I have not noticed any significant performance gain or loss doing this so prefer the above option as it’s cleaner in the PHP code.


$query->whereIn('SELECT id FROM my_other_table WHERE blah LIMIT 50');

In a perfect world, this is the kind of thing Doctrine could have done instead of the subquery magic, but there are so many factors to consider here that I can understand why they did not go down that route.

Stick with the magic, but simplify

It is also possible to continue using the magic, but greatly improve performance by simplifying. Remove all those second+ level joins and use separate queries to get hold of the data you need. How often have you added a chain of joins just to get one snippet of data from the last node? Scrap all the joins and make a new query later where you can pass in all the IDs from your main query and get the data you need. Now you don’t need a limit clause, because you are asking for exactly the data you need in the first place.

Start with the fastest hydration mode and work upwards

This one I can’t stress enough – an ORM such as Doctrine is there to give you the tools you need when you need them, but it is often the case that applications are built with all the overhead and magic just because it can be, or because it’s the default behaviour. Stop. Look at what data you *actually* need, and especially ask yourself if you need objects in your results. Perhaps you are retrieving all your users and listing them with their profile data, but you are hydrating them as objects because you need some custom functions you’ve written, classic examples are getAge() or getFullName() which are derived from other fields. If this is the only reason you are object hydrating, consider something like this:


class User
{
  function getAge()
  {
    return self::getAgeFromDOB($this["dob"]);
  }

  function getFullName()
  {
    return self::getFullNameFromUserArray($this);
  }

  public static function getAgeFromDOB($dob)
  {
    $dob = strtotime($dob);

    $year_diff  = date("Y") - date("Y", $dob);
    $month_diff = date("m") - date("m", $dob);
    $day_diff   = date("d") - date("d", $dob);

    if ($month_diff < 0 || ($month_diff == 0 && $day_diff < 0))
    {
      $year_diff--;
    }
    return $year_diff;
  }

  public static function getFullNameFromUserArray($user)
  {
    return $user["first_name"] . " " . $user["first_name"];
  }
}

In this example, we’ve kept the sideways compatibility of the getFullName function with array/object hydration, so it will work whether an object is passed or an array (sanity checking needed of course). This method could be expanded to support more “raw” hydration methods also. Now in our templates, we can just use:

<li><?php echo $user["username"]; ?></li>
<li><?php echo $user["Addresses"][0]["city"]; ?></li>
<li><?php echo User::getFullNameFromUserArray($user); ?></li>
<li><?php echo User::getAgeFromDOB($user["dob"]); ?></li>
<li>...</li>

This can massively speed up your application if you have long lists or are generally dealing with lots of data, especially data than spans several levels.

Lets also take a second to consider the super fast hydration methods, if we are really struggling for resources (perhaps working with large downloadable reports) and we can live with slightly less friendly arrays of data, scalar hydration can save the day. Shifting to scalar hydration means we lose the ability to later instantly switch to object hydration due to the alternate syntax and lack of nesting, but if we are considering scalar hydration in the first place we are probably in a situation where object hydration will never be practical.

<li><?php echo $results["u_username"]; ?></li>
<li><?php echo $results["a_city"]; ?></li>
<li><?php echo User::getFullNameFromUserArray(array("first_name" => $results["u_first_name"], "last_name" => $["u_last_name"])); ?></li>
<li><?php echo User::getAgeFromDOB($results["u_dob"]); ?></li>
<li>...</li>

I could go on and on but I hope these points have given you some food for thought!

Sep 152011
 

I had the annoying message about modules missing when I upgraded to Natty (or even Maverick?) and started Googling it.

I was not alone but the maintainers seems to not care so people was going around it. They tended to try to get the existing phpunit package running by adding pear pacages like mads..

I did that aswell, once. And got borded. Instead I downloaded an old phpunit package and installed it. And then Ubuntu found out there was a new one and wanted to upgrade to that, which we now know, is broken.

So, we download and then stop apt to upgrade.

cd /tmp
wget http://no.archive.ubuntu.com/pool/universe/p/phpunit/phpunit_3.4.13-1_all.deb
sudo dpkg -i phpunit_3.4.13-1_all.deb
sudo echo phpunit hold | dpkg --set-selections

Then it will stick. You’ll have an older version, 3.4 instead of 3.5 but at least it works.

Sep 132011
 

While I am a firm believer in TDD [Test Driven Development], we all know that a lot of the time you dump an object or variable to chase a bug down the rabbit hole. And while both print_r and var_dump are usable for this, especially after Xdebug made var_dump a lot nicer. However, this was not enough for me, I wanted more power, at first for dealing with the depth and recursion, so I built my own debug tool. It has grown quite a bit and it is available on Github [1].
It supports these features:

  • Automatic block recursive references (only show an object once)
  • Easy on-the-fly change of depth* Blacklist classes, keys and properties
  • Meta info about objects like access level of properties
  • Includes knowledge of where it dies when you debug-die
  • Alternative adapters for outputting to file, json or firephp
  • Inspect global defines
  • Output a trace of “how you got here”
  • Dump an api (method list) of an object

The tool is basically just a Debug class with an adapter (typically Html for browser output). It also comes with a css for styling the output as well as a “bootstrap” file where you can set up default values, import them from your framework and define convenience method. The most used such functions are d() and dd(). The first will debug dump any amount of variables with the default values. The second will dump and die, with an ob_flush, allowing you to use the debug dumping anywhere in your project without worrying about started output or headers etc.
You can try it for yourself, but here is a little screencast that demonstrates the kinds of outputs it shows:

http://www.youtube.com/watch?v=CBWFM_bs6nw

[1] https://github.com/alkemann/AL13/tree/master/al13_debug

 Posted by at 8:39 am
Sep 092011
 

You would be forgiven for thinking that experts never do the boring beginners stuff. Surely, the football player getting paid X fantazillion gold dubloons a week (yes it is a figure so large it is measured by the week), surely he with all that skill and money, does not practise passing? Or ball receiving? But he does – in fact that is why he is an expert.

We are all in different places on our way to glory, stardom and general PHP-related insight – but what we have in common on this path could be that we have to keep practising the fundamentals. It is never a bad idea to re-read that man page, or to get more background information on things you already know. Since last time you visited that man page, you have learned new stuff which will give you more insight or a different perspective this time around. Often I find I have misunderstood some concepts, and repetition helps correcting a point of view about to go askew.

Here is a small example to get you started: did you know break[1] accepts parameters? Neither did I. Think about it. Why would you even look up break, you have used break hundreds of times before, you know how it works and when you should use it. In fact it is so basic you would think it does not have much of an entry – a quick lookup from vim in the PHP-docs [2] though – and tada – there it is. White on black: break accepts an optional parameter to break out of zero, one or more loop-flavour or switch. You know you will be needing it shortly.

Like that football player spending a late practise shooting at the goal, so should we practise our  fundamentals. I’m passing this on to bjori next month – he’ll give you one of his to ponder.

[1]   http://php.net/break
[2] You can have unix man pages for PHP functions integrated in Vim – so you don’t even have to waste bandwidth. Awesome. http://bjori.blogspot.com/2010/01/unix-manual-pages-for-php-functions.html