2007-08-16

Even more profiling...

Ok, this will be my last post about (this round of) profiling. Honest!

I have now decided on which implementation to use so I thought I should take some time to describe it.

It *is* a bit of a hack, but it allows me to remove the eval calls in the OO implementation. It's not as fast as the flattened, procedural Rete implementation. But it's quite close and, more importantly, I believe it's easier to implement and add optimizations to it.

Here's the profiling output:

using 4000 facts
plain OO impl. Assert took 6.0490000248 s and produced 79735 facts
hack OO impl. Assert took 3.16399979591 s and produced 79735 facts
inline impl. Assert took 2.68400001526 s and produced 79735 facts
As you can see, the difference is about half a second between the hack and the inline version.
using 10000 facts
plain OO impl. Assert took 15.2119998932 s and produced 199735 facts
hack OO impl. Assert took 7.88100004196 s and produced 199735 facts
inline impl. Assert took 7.49099993706 s and produced 199735 facts
Luckily, that difference remains "constant" for any size of the data set and the difference between the plain OO and the hack remain at about 50%.

Now, to talk about how it's done. In the plain OO version, the Rete Compiler makes an instance of AlphaTestNode like this:
>>> a1 = rete.AlphaTestNode('a.n > 10')
The AlphaTestNode is defined as:
>>> class AlphaTestNode(object):
... def __init__(self, expr):
... self.expr = expr
... self.nexts = []
...
... self.right_activate = self.activate
...
... def activate(self, tag, token):
... if eval(self.expr, token):
... for next in self.nexts:
... next.right_activate(tag, token)
whenever activate is called on the a1 instance it will evaluate the expression 'a.n > 10' using the token, and all bindings (variable name and value) found in it, as execution scope. In order to remove the expensive eval call we have to "inline" the expression to be tested. A macro would have been very handy here (the Lisp type) but I don't expect them to show up in Python any time soon so instead we'll have to have a work-around. If we remove the class definition and replace it with a function:
>>> def make_AlphaTestNode(expr, var):
... class_definition = \
... """class GeneratedAlphaTestNode(object):
... def __init__(self):
... self.nexts = []
... self.right_activate = self.activate
...
... def activate(self, tag, token):
... %s = token['%s']
... if %s:
... for next in self.nexts:
... next.right_activate(tag, token)""" % (var, var, expr)
...
... exec(compiler.compile(class_definition, '__generated__', 'exec'))
... return GeneratedAlphaTestNode()
we can make an instance of AlphaTestNode equivalent to the one above (a1) using:
>>> a1 = rete.make_AlphaTestNode('a.n > 10', 'a')
The "extra" parameter ('a') is used to generate a statement to bind the variable (a), found in the token, locally before execution. This is necessary because we cannot specify token as the execution context (as we can when using eval or exec).

The parameter class_definition will look like this when it's sent to the compile method:
>>> class GeneratedAlphaTestNode(object):
... def __init__(self):
... self.nexts = []
... self.right_activate = self.activate
...
... def activate(self, tag, token):
... a = token['a']
... if a.n > 10:
... for next in self.nexts:
... next.right_activate(tag, token)
executing the code object that is returned by compile causes the class GeneratedAlphaTestNode to become available in local scope so all that is left to do is to make an instance and return it.

Ugly!? Incomprehensible!? Yes, sir! But it shaves a lot of time off of fact evaluation in the Rete Network so I don't care!

Next up is to implement a second pass in the compilation step to remove "un-necessary" nodes. This is actually the main reason I chose this design over the inlined, flattened, procedural implementation. Optimizing travseral of the Rete Network in an OO implementation means traversing it and moving "next pointers" from one node to another whilst in procedural code I would have to handle it on the level of source code (text) which feels a lot harder (I can't think of an easy way to do it so if you've got suggestions please let me know).

2 kommentarer:

woolfel sa...

often code has to be ugly or un-pretty to be fast. I also think it's an acceptable trade off, even if it makes it harder for others to understand the code.

Johan Lindberg sa...

Yes, I agree. If it hadn't made much of a difference I probably wouldn't have bothered.

Initially I didn't even think about the fact that eval calls are slow because they seemed like the *only* way to do it with that design.

And that's what started the whole thing about inlining and such. But I never thought I'd be able to find a solution which would allow me to keep the OO design *and* the faster code so all things considered. I'm very happy with how it turned out.