Topic: RegExp (1 of 6), Read 28 times
Conf: Search and Replace
From: Ian Binnie
Date: Sunday, October 22, 2006 08:40 PM

I have been using regexp quite a lot lately, but have some problems.

One example, I am trying to search lines like the following, and replace the 2nd Mary.

... 3 Mary Ann LEWIS b: 26th July, 1870, Commissioners Waters, NSW, Cem. Armidale. Mary ...

This works OK, unless there is another match later in the line, and it always finds the last.

I am using:-

Replace("^(\.+\s[1-9]\s\b([A-Z][a-z]+)\b.+)\b\2\b", "\1", REGEXP|NOERR|CONFIRM)

I know that Unix .+ is supposed to match the longest possible string that still allows the rest of the expression to match, but am not using MAX.

I have read the help many times, and am not sure just what this should match.

 


Topic: Re: RegExp (2 of 6), Read 25 times
Conf: Search and Replace
From: Christian Ziemski
Date: Saturday, October 28, 2006 10:18 AM

On Sun, 22 Oct 2006 20:40:00 -0400, Ian Binnie wrote:

>I have been using regexp quite a lot lately, but have some problems.
>
>One example, I am trying to search lines like the following, and replace the
>2nd Mary.
>[...]

Ian:

I'm looking at your example for some time now, trying to understand
it. But I'm not able to.

The complete line seems to be changed to itself but without the word
"Mary"?!?

If you would explain your plan a bit more detailed I'll try to help.


Christian

 


Topic: Re: RegExp (3 of 6), Read 21 times
Conf: Search and Replace
From: Ian Binnie
Date: Sunday, October 29, 2006 07:15 PM

On 10/28/2006 10:18:05 AM, Christian Ziemski wrote:
>On Sun, 22 Oct 2006 20:40:00 -0400, Ian
>Binnie wrote:
>
>>I have been using regexp quite a lot lately, but have some problems.
>>
>>One example, I am trying to search lines like the following, and replace the
>>2nd Mary.
>>[...]
>
>Ian:
>
>I'm looking at your example for some
>time now, trying to understand
>it. But I'm not able to.
>
>The complete line seems to be changed to
>itself but without the word
>"Mary"?!?
>
>If you would explain your plan a bit
>more detailed I'll try to help.
>
Replace("^(\.+\s[1-9]\s\b([A-Z][a-z]+)\b.+)\b\2\b", "\1", REGEXP|NOERR|CONFIRM)

Replace("
^ begin of line
( group 1
\.+\s[1-9]\s find leader i.e. repeated . space digit space
\b([A-Z][a-z]+)\b find Name on word boundaries => group 2
.+ match all
) end group 1
\b\2\b match group 2 on word boundaries

The command is designed to match a line containing a Name, up to a repeat of the name, then delete the 2nd name (by replacing the found string with group 1 - up to the repeated name.

Unfortunately if the line contains more than 2 copies of the name, it does not match the 1st repeat, but the last.

My question is about the Regexp behaviour, which is not what I would expect.

I am not trying to just solve this problem, I could do this many ways e.g. a loop which copied the found Name into a text register.

 


Topic: Re: RegExp (4 of 6), Read 23 times
Conf: Search and Replace
From: Ted Green
Date: Sunday, October 29, 2006 10:43 PM

At 06:17 PM 10/29/2006, you wrote:
>My question is about the Regexp behaviour, which is not what I would expect.
>
>I am not trying to just solve this problem, I could do this many ways e.g. a loop which copied the found Name into a text register.

I have asked Tom Burt to replicate this, as it would be a bug.

Ted.

 


Topic: Re: RegExp (5 of 6), Read 24 times
Conf: Search and Replace
From: Ted Green
Date: Saturday, November 04, 2006 05:44 PM

At 07:41 PM 10/22/2006, you wrote:6
>I have been using regexp quite a lot lately, but have some problems.

We have confirmed and fixed the problem - inside () the ".+" was always maximized. Tom also fixed the search dialog box to allows search strings up to 1024 chars. Also a problem related to search string history.

Tom is currently attempting to implement ".+?" and ".*?" as a minimized search for just that item, overriding the "MAX" option. This is the RE syntax used by Perl.

Unfortunately, I have been very busy with SpamStopHere and have not had much time for VEDIT recently. The recent spam trick of using randomized images has particularly kept me busy, but we finally have solved it - we had to design and implement three levels of image filtering - HTML pattern matching, template and network analysis, and OCR.

Ted.

 


Topic: Re: RegExp (6 of 6), Read 29 times
Conf: Search and Replace
From: Ian Binnie
Date: Saturday, November 04, 2006 08:19 PM

On 11/4/2006 5:44:48 PM, Ted Green wrote:
>At 07:41 PM 10/22/2006, you
>wrote:6
>>I have been using regexp quite a lot lately, but have some problems.
>
>We have confirmed and fixed
>the problem - inside () the
>".+" was always maximized. Tom
>also fixed the search dialog
>box to allows search strings
>up to 1024 chars. Also a
>problem related to search
>string history.
>
>Tom is currently attempting to
>implement ".+?" and ".*?" as a
>minimized search for just that
>item, overriding the "MAX"
>option. This is the RE syntax
>used by Perl.
>
Thanks for looking at this.

It will make RegExp much more useful.