At 09:45 AM 2/8/2009, you wrote:
>From: "Christian Ziemski"
>
>On 05.02.2009 00:58 in vedit-suggest Ted Green wrote:
>>
>> I have spent the past two weeks writing a regular expression (RE)
>> interpreter in C for one of our anti-spam products. (Yes, Perl and
>> Ruby have built-in regular expressions, but we still had need for an
>> interpreter.) This relates to VEDIT as I want to add full group
>> processing to VEDIT's RE, e.g. things like: (a+b+){3,4}cde
>>
>> While understanding complex RE is hard enough, implementing them
>> leads to insanity. ;-)
>
>I'm reading Jeffrey Friedl's "Mastering Regular Expressions" right now.
>Fascinating!
>... and you are implementing your own regex machine. WOW!
Full Regular Expression (RE) are usually implemented as finite-state machines; i.e. a specific RE is compiled into an optimized program which performs that specific match. There is a bit of overhead in compiling, but then the search is optimized.
The RE in VEDIT are mostly interpreted; however we prescan the RE and convert classes (e.g. [a-zA-Z]) into lookup tables.
As you know, VEDIT does not fully implement groups, e.g. you can use (...)+, (...){1,6}, etc.
The RE I implemented for our antispam product is also interpreted. Since it might run 1000 RE on a 5K buffer, it would take more CPU time to compile the RE than to interpret them.
Also, I don't need or even want full RE as they are error prone, especially in a time-pressured environment.
Ted.
|
|