2007-10-21

A small Regular Expression library for Clips

Inspired by the first chapter in Beautiful Code I've decided to try and implement a small regular expressions library in Clips. The code that's discussed in the book is taken from another book; The Practice of Programming and the match code, written by Rob Pike, is amazingly enough only about 35 lines of C.

I can say straight away that my code (roughly 200 lines) is *not* very beautiful. But then again, that wasn't the point of the exercise.

Seeing as this is supposed to become a library I've implemented it a bit differently as well. To begin with, I've done two separate functions: match and search. They behave (roughly) as described in the docs for Python's re module: match only matches from the beginning of the string (roughly equivalent to Rob's matchhere function) and search matches anywhere.

I've started working on findall, split and sub (substitute) but it's taking slightly longer to finish them than I've expected. I've also been *thinking* about implementing:
1) character sets; for example \s, \d and \w
2) match groups; so that you could write "(.+) \1" and have it match repeated character sequences
3) and some more quantifiers; +, ? and possibly also {m[,n]}, {m[,n]}?, *?, +? and ??.

but I don't know. The code is complex enough as it is and I think I'd take too long doing it (according to the book, it only took Rob about two hours to finish his implementation and I've already spent a *lot* more time on this than that ;-)

Here's how it works:

CLIPS> (load "regexp.clp")
:!!!
TRUE
CLIPS> (match "do" "Lorem ipsum dolores sit amet")
nil
CLIPS> (match ".* " "Lorem ipsum dolores sit amet")
"Lorem "
CLIPS> (search "do" "Lorem ipsum dolores sit amet")
(13 14 "do")
The code is available here and I've also prepared a batch file with unit tests which can serve as an example of how to use the functions. NOTE! You will also have to have unittest.clp in order to run it.

If you find any bugs, please let me know. Enjoy.

2 kommentarer:

woolfel sa...

maybe you should submit it to CLIPS for inclusion :)

Johan Lindberg sa...

Yeah ;-)

Well, I'd definitely not say no to a regexp library included in Clips but I don't think my attempt is up to it.

You're of course welcome to trim it a bit, might make it fast enough to be useful.