Visar inlägg med etikett Regular Expressions. Visa alla inlägg
Visar inlägg med etikett Regular Expressions. Visa alla inlägg

2007-11-29

Added some functionality in the Clips regexp library

Today I spent about three hours on a train between Stockholm and Gothenburg. That was nearly enough to complete the functionality in the small regexp library for Clips I wrote a few weeks ago.

I've added findall, split and sub(stitute) functions which work likeso:

CLIPS> (load "regexp.clp")
:!!!!!!
CLIPS> (findall "a" "Today I spent about three hours on a train between Stockholm and Gothenburg")
("4 4 "a"" "15 15 "a"" "36 36 "a"" "40 40 "a"" "62 62 "a"")
CLIPS> (split "a" "Today I spent about three hours on a train between Stockholm and Gothenburg")
("Tod" "y I spent " "bout three hours on " " tr" "in between Stockholm " "nd Gothenburg")
CLIPS> (sub "a" "A" "Today I spent about three hours on a train between Stockholm and Gothenburg")
"TodAy I spent About three hours on A trAin between Stockholm And Gothenburg"
Ok, my examples might not be the best but you get the idea. Both split and sub also take an optional parameter ?max which allows you to limit the number of splits and substitutions:
CLIPS> (sub "a" "A" "Today I spent about three hours on a train between Stockholm and Gothenburg" 2)
"TodAy I spent About three hours on a train between Stockholm and Gothenburg"
The library consists of two files: regexp.clp and regexp-test.bat (requires unittest.clp).

[2007-11-30] Update: I found a bug in the search function which caused a string not to be properly searched (the search ended too early). Apparently my test cases are not very good...

2007-10-21

A small Regular Expression library for Clips

Inspired by the first chapter in Beautiful Code I've decided to try and implement a small regular expressions library in Clips. The code that's discussed in the book is taken from another book; The Practice of Programming and the match code, written by Rob Pike, is amazingly enough only about 35 lines of C.

I can say straight away that my code (roughly 200 lines) is *not* very beautiful. But then again, that wasn't the point of the exercise.

Seeing as this is supposed to become a library I've implemented it a bit differently as well. To begin with, I've done two separate functions: match and search. They behave (roughly) as described in the docs for Python's re module: match only matches from the beginning of the string (roughly equivalent to Rob's matchhere function) and search matches anywhere.

I've started working on findall, split and sub (substitute) but it's taking slightly longer to finish them than I've expected. I've also been *thinking* about implementing:
1) character sets; for example \s, \d and \w
2) match groups; so that you could write "(.+) \1" and have it match repeated character sequences
3) and some more quantifiers; +, ? and possibly also {m[,n]}, {m[,n]}?, *?, +? and ??.

but I don't know. The code is complex enough as it is and I think I'd take too long doing it (according to the book, it only took Rob about two hours to finish his implementation and I've already spent a *lot* more time on this than that ;-)

Here's how it works:

CLIPS> (load "regexp.clp")
:!!!
TRUE
CLIPS> (match "do" "Lorem ipsum dolores sit amet")
nil
CLIPS> (match ".* " "Lorem ipsum dolores sit amet")
"Lorem "
CLIPS> (search "do" "Lorem ipsum dolores sit amet")
(13 14 "do")
The code is available here and I've also prepared a batch file with unit tests which can serve as an example of how to use the functions. NOTE! You will also have to have unittest.clp in order to run it.

If you find any bugs, please let me know. Enjoy.