2008-04-11

AST to AST translation in Python

In an attempt to further simplify PyRete's rule compiler I've split it into three parts.

1) Pre-processing (splits a rule into subrules, if necessary),
2) macro-expansion (translates, for example, Exists(CE*) into Not(Not(CE*))) and
3) Rete Network construction (creates Production-, Alpha- and BetaNodes and joins them together).

The complexity of trying to handle all of that in one class just got out of hand. Also, I think this design will make PyRete a lot more flexible and easier to extend with custom constructs (given that they are not violating Python's syntax that is).

Anyway, the idea is that the Rete Network construction (step 3) should only have to deal with constructs that can be translated directly into a node in the Rete Network and that everything else should be handled in either of the two previous steps. This means, among other things, that in step 3 there'll only be Object-Pattern-CEs, Test-CEs and Not-CEs left.

I'll talk about parts of step 2 (the macro-expansion) since I think it's the most interesting part.

Unfortunately there isn't that much written about AST to AST translation in Python. Not very surprising considering that it's not generally the level of abstraction most people decide to interfer with the compilation-chain. Most attempts either cover the whole thing, from front to back, like python4ply and EasyExtend or perform low-level bytecode manipulation as bytecodehacks and psyco. Even the Python docs for the compiler module and its AST visitor class is quite sparse to say the least.

Luckily I've spent more than a year poking around on the AST level of Python code so I'm used to trial and error and having exceptions thrown in my face every now and then. The translation scheme I've implemented is not as smooth as having Common Lisp or Scheme macros but it does the job.

Here's the code for handling the Exists-CE:

>>> class ExistsCE(object):
... @classmethod
... def expand(self, node):
... nodesToVisit = [node]
... while len(nodesToVisit) > 0:
... curr = nodesToVisit.pop(0)
... for child in curr.getChildNodes():
... nodesToVisit.append(child)
... if isinstance(curr, CallFunc) and curr.node.name == "Exists":
... curr.node.name = "Not"
... curr.args = [CallFunc(Name("Not"), curr.args)]
... return node
and here's a sample usage (I've highlighted the interesting parts).
>>> ast = compiler.parse("Exists(Spam(eggs == 1, bacon = 2))")
>>> print ast
Module(None, Stmt([Discard(CallFunc(Name('Exists'), [CallFunc(Name('Spam'), [Com
pare(Name('eggs'), [('==', Const(1))]), Keyword('bacon', Const(2))], None, None)
], None, None))]))
and if we run it through the ExistsCE macro expander we get:
>>> print ExistsCE.expand(ast)
Module(None, Stmt([Discard(CallFunc(Name('Not'), [CallFunc(Name('Not'), [CallFun
c(Name('Spam'), [Compare(Name('eggs'), [('==', Const(1))]), Keyword('bacon', Con
st(2))], None, None)], None, None)], None, None))]))
which is exactly the same AST as if we had written two nested Not calls instead:
>>> print compiler.parse("Not(Not(Spam(eggs == 1, bacon = 2)))")
Module(None, Stmt([Discard(CallFunc(Name('Not'), [CallFunc(Name('Not'), [CallFun
c(Name('Spam'), [Compare(Name('eggs'), [('==', Const(1))]), Keyword('bacon', Con
st(2))], None, None)], None, None)], None, None))]))
I wonder if I can abstract away some of that boiler plate, make it more general and usable. I'll have to tinker some more but it shouldn't be impossible.

Inga kommentarer: