Recursiveness: CakePHP Set::extract() and JSONPath
September 12, 2008
Posted in PHP

Just ran across this post where Felix shares some wisdom he gained while re-writing CakePHP‘s Set::extract() (code) method to make it faster and add some XPath 2.0 support.

If you need a function to have the highest performance, try to express it non-recursively. It can make a 500% difference.

Now I thought the 500% speed gain (and associated fewer CPU cycles) was a number he grabbed out of the air or it was based on comparisons with the previous Set::extract() method and very specific to that case. It surely would not apply to any algorithm in general as some are much more complex than others.

While it may not apply to any algorithm in general it may apply to similar algorithms. So I benchmarked the new Set::extract() against a library I remembered called JSONPath (code) which also allows you to extract data from nested arrays using a string selector. JSONPath uses recursion just like the old Set::extract() method.

I used the second $users array from here and ran the following two tests:

// Simple
Set::extract('/User/name', $users);
jsonPath($users, "$");
// Complex
Set::extract('/User/Item[id>2]/name', $users);
jsonPath($users, "$..User.Item.[?(@['id']>2)].name");

Both of these tests generate the same output so they are a pretty good comparison for the underlying algorithms. Running them 100 times each gives the following result:


Lo and behold! The speed difference is ~500%!

Now that is a funny answer to my question! NOTE: The two libraries (and old Set::extract()) have different features and cannot be compared like this. The 500% is only applicable for this specific example but take it to show that recursion should definitely be evaluated when looking for speed gain. The cost for less recursion may be less modularity and extensibility, but that is a tradeoff you always need to make.

Now don’t use these figures to choose between Set::extract() and JSONPath. They work very differently internally. JSONPath uses a lot of recursion and jumping around but it also uses a lot of regular expression matching which is notoriously slow in PHP.

JSONPath has a very academic background and illustrates some neat concepts. If someone was to take it and put a practical spin on it with some performance improvements (reducing recursiveness!?) and new features, it could make for an even better library to extract data from arrays.

September 14, 2008

Thanks a lot for this comparison. I actually did look at JSONPath at some point and found the implementation to be very interesting. If I remember correctly JSONPath is also a more complete and powerful implementation than Set::extract, so I don’t think its quite fair to compare the both directly.

About the 500%: That number was actually partially made up. I remembered that at some point I saw a jump in the ~500%ish range when first experimenting with a non-recursive approach. But after adding some more features and bug fixes I think the speed went down a little from that. So while I think that not using recursions is the biggest speed gain Set::extract has over JSONPath, the 500% difference is more of a coincidence and not directly related to that particular aspect of the implementation.

Christoph Dorn
September 14, 2008

The comparison makes for a rough baseline but as you mentioned they cannot really be compered as they have different features.

JSONPath does have more features but it is not very user friendly as the selector syntax can get pretty hard to follow and there seem to be some inconsistencies with what is documented and implemented. I had one heck of a time trying to get a complex selector working for both libraries that produces the same result.

Taking the best implementation ideas from both and merging them to come up with a new library that implements a good set of the XPath 2.0 features would be great. Sticking with the XPath syntax probably makes the most sense as a lot of thought has been put into it and many people know it. There could also be a simplified syntax.

Comments are closed.