About Me

Curriculum Vitae

A brief list of my current skill set

Bloggybits

Fare Ye Well Work Email You Have Served Me Well
Monday, 17th September 2012, 14:36

Cause of death too much spam

Forest Racer - A HTML5 Game in Under 13K
Tuesday, 11th September 2012, 20:46

Including all assets, but only when zipped

Entering a 13k HTML5 Game Competition
Tuesday, 4th September 2012, 16:31

I'm so tempted to have a go at this

Faster Loops and Faster Iterations in Node.js and V8
Wednesday, 29th August 2012, 13:16

Is Object.keys faster than For...In?

And the Fastworks.js framework is Born!
Wednesday, 22nd August 2012, 16:23

Well I'm excited, even if you aren't

Libxmljs Update on CentOS 3.8 throws an SELinux Wobbley Fit
Monday, 20th August 2012, 15:40

The right way to fix this sort of issue

TV Land Doesn't Understand Technology
Friday, 17th August 2012, 17:09

Or maybe it does and thinks we don't?

Yet More Benchmarking - Function Chains vs Object Chains
Wednesday, 15th August 2012, 13:34

Working towards a faster Node.js framework

More Benchmarking in Node.js and V8
Tuesday, 14th August 2012, 12:19

Working out the fastest way to route

MinnaHTML.js Benchmarking for Speed in Node.js
Monday, 13th August 2012, 17:55

Don't believe it, test it

Playing Around With HTML5's Canvas
Friday, 10th August 2012, 16:19

Speccy loading screens in a browser!

Scripting in Node.js AKA How to Watch for Olympic Tickets Using a Script
Tuesday, 7th August 2012, 23:37

Let Node refresh the webpage so you don't have to!

A Javascript Confirm/Alert Replacement jQuery Plugin
Monday, 6th August 2012, 23:06

Much prettier than the horrible alert and confirm dialogs

A Few Node.js Essential Modules
Friday, 3rd August 2012, 14:40

As essential as they can be anyway

Retro Coding Corner: Loading ZX Spectrum Snapshots off Microdrives - Part 2
Tuesday, 31st July 2012, 18:20

It's like juggling 8 balls with one hand behind your back

Projects and Sillyness

MAME Cabinet Diary

How I built my own arcade cabinet

Loading Screen Simulator

I don't miss the ZX Spectrum, I still use it!

The Little Guy Chat Room

It's a Pitfall inspired chat room

GPMad MP3

A fully featured MP3 player what I wrote

GP Space Invaders

My first little emulator

GP32 Development Page

Some info and links about this cute little handheld

Disney Nasties

Uncensored images, you must be 18 to view them

Diary of a Hamster

Learn about how hamsters think, first hand

Utilities

Time Calculator

A simple little online utility for working out how many hours to bill a client

A Few Links

More Benchmarking in Node.js and V8
Tuesday, 14th August 2012, 12:19

I may have mentioned that I didn't like the way heavy frameworks (and the blind captains that sail in them) had begun to infect the Node.js stratosphere, and that I might consider making my own. Well the first stage is to look closely at the core design decisions and then benchmark them to make sure they really are a fast way to do things.

My plan is this, create something as easy to use as Connect, because that really is a nice framework. But to make it as fast as possible, moving the goal posts away from people who are trying to turn Node.js into .NET, and back towards the clean and mean approach which is what it should all be about.

Basic Design

The built in modules of Node.js should be treated as sacrosanct, the internals may change, but the guys who made Node.js know what they are doing far more than I do, so I trust them to continuously improve how things work and make them faster.

Routing should be the starting point for everything, and it should route to multiple stacks of middleware. At the moment, with Connect, on my websites a connection comes in and several things are done across all of them. Thing is, they don't all need everything, so why do it!

If a request for a static item comes in, say as an image or favourite icon, this should be sent to a stack that does nothing more than pipes the image out. A webpage request that requires validation using cookies, this should never even pass through parts of a stack designed for serving static requests, why waste the time? But it should go through a stack which deals with cookies, but not waste time in one that worries about form data.

And likewise, any page where form data is submitted, this should go through a stack which deals with that, probably taking advantage of formidable. Or some alternative if you prefer.

But the basic design philosophy should be routing is king, calculate a route as fast as possible, and then send it to the most efficient stack for that request.

Routing Methods

I can think of a few methods to test URIs for routing purposes, since this is a fundamental feature it needs to be the fastest. The most obvious is a straight string match, slightly more complicated versions can involve regular expressions which could be slow. Possibly not as slow as the latter would be an associative array based method, but my gut feeling is it won't be as quick as a straight string match technique.

However I think it is the job of a framework/library designer to encourage best practise, so by deliberately not supporting something which is bad, should mean I don't turn into Microsoft, though it would certainly be nice to have their money.

So, I'm going to test a number of approaches and see what the damage is. And who knows, if the performance of some are better for certain circumstances than others, perhaps a choice should be offered. One may be ideal for a short list of routes, another may be far better if the list is huge.

Basic Benchmarking Script

When we are setting up, time is cheap, so we can safely assume that creating an array of routes and sorting them is free. There is also no reason why we couldn't offer some manual weighting system to speed it up further, so lets test a sensible system.

Our standard list of possible routes for this test will be as follows:

/service/delete
/service/get
/service/put
/service
/css/
/images/
/script/
/

And we'll test it with the following URIs:

/service/delete/28
/service/put/44
/service/get/a/much/really/long/route
/service/view
/css/default.css
/images/snookie.jpg
/script/jquery.js
/roadtonowhere
/

Our basic benchmarking script is going to look like this:

var runtimes = 1000000;

var routes = [
{ path: "/service/delete", destination: doNothing },
{ path: "/service/get", destination: doNothing },
{ path: "/service/put", destination: doNothing },
{ path: "/service", destination: doNothing },
{ path: "/css/", destination: doNothing },
{ path: "/images/", destination: doNothing },
{ path: "/script/", destination: doNothing },
{ path: "/", destination: doNothing }
];

var uris = [
"/service/delete/28",
"/service/put/44",
"/service/get/a/much/really/long/route",
"/service/view",
"/css/default.css",
"/images/snookie.jpg",
"/script/jquery.js",
"/roadtonowhere",
"/"
];

console.log("Runs: " + runtimes);

for (var u = 0; u < uris.length; u++)
timeRoute(runtimes, uris[u], ROUTING_FUNCTION_WE_WILL_TEST);


function timeRoute(runtimes, strRoute, fncRouter)
{
var dtStart = new Date();

for (var runcount = 0; runcount < runtimes; runcount++)
fncRouter(strRoute);

var dtEnd = new Date();

while (strRoute.length < 20)
strRoute += " ";
console.log("Route: " + strRoute + "Total Time: " + (dtEnd - dtStart) + "ms, Avg Time: " + (dtEnd - dtStart) / runtimes);
}

function doNothing()
{
var x = Math.random();

return x;
}

This should be very easy to follow, and I've tried to keep it as close to how we'll actually do things in the framework by including a destination function in the routing array.

Simple String Match

Here is our most simple case scenario, literally going through and matching strings, firing a call off that represents the correct route.

function testStringMatch(strRoute)
{
for (var p = 0; p < routes.length; p++)
{
if (routes[p].path == strRoute.substr(0, routes[p].path.length))
{
routes[p].destination();
return;
}
}
}

And here are our results, run a few times until they settled into a repeating pattern:

Route: /service/delete/28 Total Time: 128ms, Avg Time: 0.000128
Route: /service/put/44 Total Time: 220ms, Avg Time: 0.00022
Route: /service/get/a/much/really/long/route Total Time: 192ms, Avg Time: 0.000192
Route: /service/view Total Time: 281ms, Avg Time: 0.000281
Route: /css/default.css Total Time: 410ms, Avg Time: 0.00041
Route: /images/snookie.jpg Total Time: 420ms, Avg Time: 0.00042
Route: /script/jquery.js Total Time: 543ms, Avg Time: 0.000543
Route: /roadtonowhere Total Time: 450ms, Avg Time: 0.00045
Route: / Total Time: 255ms, Avg Time: 0.000255

This is as you would expect, the earliest route matches are quicker, and the further down the array it must go the longer it takes. Before you ask, I did try optimising things a tiny bit by storing routes[x].path.length as routes[x].len but it made no difference at all, which shows you how good at times V8 is at optimising.

Regular Expression String Match

Using almost identical matches to the ones above, with the only stipulation that we demand the start of a line, regular expression matching doesn't at first appear as costly as all that.

Here is our new routes array and test function:

var regexroutes = [ 
{ regex: /^\/service\/delete/, destination: doNothing },
{ regex: /^\/service\/get/, destination: doNothing },
{ regex: /^\/service\/put/, destination: doNothing },
{ regex: /^\/service\//, destination: doNothing },
{ regex: /^\/css\//, destination: doNothing },
{ regex: /^\/images\//, destination: doNothing },
{ regex: /^\/script\//, destination: doNothing },
{ regex: /^\//, destination: doNothing }
];

function testRegExStringMatch(strRoute)
{
for (var p = 0; p < regexroutes.length; p++)
{
if (strRoute.match(regexroutes[p].regex))
{
regexroutes[p].destination();
return;
}
}
}

And here are our results:

Runs: 1000000
Route: /service/delete/28 Total Time: 140ms, Avg Time: 0.00014
Route: /service/put/44 Total Time: 341ms, Avg Time: 0.000341
Route: /service/get/a/much/really/long/route Total Time: 215ms, Avg Time: 0.000215
Route: /service/view Total Time: 332ms, Avg Time: 0.000332
Route: /css/default.css Total Time: 375ms, Avg Time: 0.000375
Route: /images/snookie.jpg Total Time: 493ms, Avg Time: 0.000493
Route: /script/jquery.js Total Time: 505ms, Avg Time: 0.000505
Route: /roadtonowhere Total Time: 587ms, Avg Time: 0.000587
Route: / Total Time: 518ms, Avg Time: 0.000518

For simple matching, V8's regex engine is clearly pretty efficient. This isn't to say you couldn't write an awful one that won't make it cry, I'm sure you could. But using sensible ones seems to not be that costly.

Using more complicated RegExs, like the following:

var regexroutes = [ 
{ regex: /^\/service\/delete\/[0-9]+$/, destination: doNothing },
{ regex: /^\/service\/get/, destination: doNothing },
{ regex: /^\/service\/put\/[0-9]+/, destination: doNothing },
{ regex: /^\/service\//, destination: doNothing },
{ regex: /^\/css\/[a-z]+\.css/, destination: doNothing },
{ regex: /^\/images\/[a-z]+\.jpg/, destination: doNothing },
{ regex: /^\/script\//, destination: doNothing },
{ regex: /^\/$/, destination: doNothing }
];

We get results which are still pretty solid:

Runs: 1000000
Route: /service/delete/28 Total Time: 155ms, Avg Time: 0.000155
Route: /service/put/44 Total Time: 266ms, Avg Time: 0.000266
Route: /service/get/a/much/really/long/route Total Time: 226ms, Avg Time: 0.000226
Route: /service/view Total Time: 402ms, Avg Time: 0.000402
Route: /css/default.css Total Time: 378ms, Avg Time: 0.000378
Route: /images/snookie.jpg Total Time: 442ms, Avg Time: 0.000442
Route: /script/jquery.js Total Time: 588ms, Avg Time: 0.000588
Route: /roadtonowhere Total Time: 488ms, Avg Time: 0.000488
Route: / Total Time: 630ms, Avg Time: 0.00063

Whilst for very short routes like the root home page, string matching is fast, there is no doubt that the cost of using regexps for routing is quite low for many cases.

Associative Array Matching

Now for something a bit way out there, I really am not expecting this to be as fast at all, but let's experiment with it anyway and find out.

Here is our new array and test function:

var routesobj = [ 
{ "/service/delete": doNothing },
{ "/service/get": doNothing },
{ "/service/put": doNothing },
{ "/service": doNothing },
{ "/css/": doNothing },
{ "/images/": doNothing },
{ "/script/": doNothing },
{ "/": doNothing }
];

function testAssociativeMatch(strRoute)
{
while (strRoute.length)
{
if (routesobj[strRoute])
{
routesobj[strRoute]();
return;
}

strRoute = strRoute.substring(0, strRoute.lastIndexOf("/"));
}
}

And needless to say, the longer the route the slower it is:

Runs: 1000000
Route: /service/delete/28 Total Time: 1584ms, Avg Time: 0.001584
Route: /service/put/44 Total Time: 1606ms, Avg Time: 0.001606
Route: /service/get/a/much/really/long/route Total Time: 4328ms, Avg Time: 0.004328
Route: /service/view Total Time: 990ms, Avg Time: 0.00099
Route: /css/default.css Total Time: 1005ms, Avg Time: 0.001005
Route: /images/snookie.jpg Total Time: 1106ms, Avg Time: 0.001106
Route: /script/jquery.js Total Time: 1031ms, Avg Time: 0.001031
Route: /roadtonowhere Total Time: 357ms, Avg Time: 0.000357
Route: / Total Time: 334ms, Avg Time: 0.000334

Maybe we can optimise things a bit, but perhaps this way lacks any sort of clean matching process without manipulating strings.

Tree Route Matching

This is another way out idea, that involves an object based tree hierarchy, so a URI like /service/delete/28 becomes obj.service.delete.28. Again really not expecting this to be any good at all, but for the investment time in finding out, it seems silly to not try.

However this is not easy to do, and requires abusing eval(). To make matters worse the most efficient way removes options when it comes to what you can and can't match. Here is my quick attempt at it:

var routestree = {
service: { delete: doNothing, get: doNothing, put: doNothing },
css: doNothing,
images: doNothing,
script: doNothing
};

function testAssociativeMatch(strRoute)
{
strRoute = "routestree" + strRoute.replace(/\//g, ".");

while (strRoute.length)
{
try
{
eval("if(" + strRoute + ") " + strRoute + "();");
}
catch (error)
{
}
strRoute = strRoute.substring(0, strRoute.lastIndexOf("."));
}
}

A nightmare as you can see. We have to catch errors which eats time, and manipulate the string which eats time too. This is more of a lesson in stuff you can do, rather than stuff you should, and it would need all sorts of sanitising of input or you'd be in a world of hurt. But what the hell, let's benchmark it.

Runs: 1000000
Route: /service/delete/28 Total Time: 101297ms, Avg Time: 0.101297
...CTRL-C

As you can see, no way hosey.

Conclusion

Regular expressions are the way to go when it comes to routing. Sure someone could make an awful one, but with a bit of guidance (or more accurately just making sure the examples are clear and efficient) that shouldn't be too much of an issue.

RegExs.... I CHOOSE JOOOOOOOOO!

But at the same time, people might want straight matching, so worth offering that as an option too.

Comments

Add Your Own Comment