Read topic starting at message #67093

Topic:	VEDIT 6.20.2 has updated Scribe spelling checker (1 of 33), Read 97 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Thursday, May 26, 2011 06:24 PM

VEDIT 6.20.2 has a much faster Scribe spelling checker.
First I want to thank Scott Lambert again for writing it over many years.

Here are my improvements:

* The wordchk.vdm macro used the pattern matching code "|<" to search for words in the dictionaries. Since the "word" search option was also used, the "|<" was unnecessary and significantly slowed down the search. The search code scans the search string for any pattern codes and if none are found, goes into SIMPLE mode, which is much faster. This doubled the overall speed.

* The wordchk.vdm macro immediately returns success for any 1-character word.

* The getword.vdm macro used two update() commands to show progress. I eliminated one entirely.
The other I replaced with:

if (time_tick != #68) { //Only perform 18 updates per second
#68 = time_tick
update()
}

This prevents the entire screen from updating for each word. That again more than doubled the overall speed.

* I made other very minor speed improvements. Other minor improvement could be made, but the speed is now reasonable and much faster.

* I changed the code so that "'s' is ignored and only the preceding word is checked. Therefore things like "mother's" no longer get flagged.

* I updated the vedit.vdf file with more of the VEDIT commands. I spell checked most doco files and some macros.

These Scribe changes are in the tentative VEDIT 6.20.2 on the website under:

www.vedit.com/download/vpa-prod (and add either .exe or .zip)
(Some security software will block any email that links to an .exe file.)

Ted.

Topic:	VEDIT 6.20.2 has updated Scribe spelling checker (2 of 33), Read 81 times
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Friday, May 27, 2011 09:18 AM

On 5/26/2011 6:24:25 PM, Ted Green wrote:
>VEDIT 6.20.2 has a much faster
>Scribe spelling checker.
>First I want to thank Scott
>Lambert again for writing it
>over many years.
>
>Here are my improvements:
>
>* The wordchk.vdm macro used
>the pattern matching code "|<"
>to search for words in the
>dictionaries. Since the "word"
>search option was also used,
>the "|<" was unnecessary and
>significantly slowed down the
>search. The search code scans
>the search string for any
>pattern codes and if none are
>found, goes into SIMPLE mode,
>which is much faster. This
>doubled the overall speed.
>

The |< was implemented so Peter could have comments in the vdf files. The idea was Scribe would ignore any word that didn't start on column one, so when you start a comment with \\ or a ;, scribe would ignore the comment.

I have not seen your changed code yet, but now if a vdf file has comments in it, I am thinking that will give unusual results when the user when Scribe generates its list of replacement words.

Scribe can no longer I think tell the difference between a dictionary word and a comment. So if the user misspells a word and the replacement word is in the comment, Scribe assumes only one word per line, so the user might get a phase (part of the comment) as one suggustion.

I hope that made sense, as I say I have not looked at your code.

Your changes otherwise all seem good ideas.

Scott

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (3 of 33), Read 82 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Friday, May 27, 2011 09:55 AM

----- Original Message -----
> From: "Scott Lambert" ( scottmacros@... ) On 5/26/2011 6:24:25
> The |< was implemented so Peter could have comments in the vdf files.

Scott:

OK I understand the reason now. After you run my modified Scribe I think you'll agree that allowing "mispalled comments" is not worth the speed degradation. Correctly spelled comments won't hurt however.

Ted.

Topic:	VEDIT 6.20.2 has updated Scribe spelling checker (27 of 33), Read 25 times
Conf:	VEDIT Macro Library
From:	Peter Rejto
Date:	Thursday, July 21, 2011 12:40 PM

On 5/27/2011 9:18:13 AM, Scott Lambert wrote:
>On 5/26/2011 6:24:25 PM, Ted Green
>wrote:
>>VEDIT 6.20.2 has a much faster
>>Scribe spelling checker.
>>First I want to thank Scott
>>Lambert again for writing it
>>over many years.
>>
>>Here are my improvements:
>>
>>* The wordchk.vdm macro used
>>the pattern matching code "|<"
>>to search for words in the
>>dictionaries. Since the "word"
>>search option was also used,
>>the "|<" was unnecessary and
>>significantly slowed down the
>>search. The search code scans
>>the search string for any
>>pattern codes and if none are
>>found, goes into SIMPLE mode,
>>which is much faster. This
>>doubled the overall speed.
>>
>
>The |< was implemented so Peter could
>have comments in the vdf files. The idea
>was Scribe would ignore any word that
>didn't start on column one, so when you
>start a comment with \\ or a ;, scribe
>would ignore the comment.
>
>I have not seen your changed code yet,
>but now if a vdf file has comments in
>it, I am thinking that will give unusual
>results when the user when Scribe
>generates its list of replacement words.
>
>Scribe can no longer I think tell the
>difference between a dictionary word and
>a comment. So if the user misspells a
>word and the replacement word is in the
>comment, Scribe assumes only one word
>per line, so the user might get a phase
>(part of the comment) as one suggustion.
>
>I hope that made sense, as I say I have
>not looked at your code.
>
>Your changes otherwise all seem good
>ideas.
>
>Scott

Scott,

I was very glad to learn from the Vedit 6.2.1 relase notes that the Scribe speed has been increased 30 fold. Congratulations.

Now, I realize that I have been slowing Scribe down. Specifically, my suggestion of editing
a dictionary file in the style of a maro file was a bad one. Simply a .vdf file is not a .vdm
file and do not try to treat it that way.

So, I would like to amend my suggestion: First, I would code the dictionary files.
Second, I would a create a Vedit style .not file and edit this .not file.

For example, I could code the English.vdf file by starting it with the code word
VDFENU. Then in the vdf.not file I would explain that this is the original English
US dictionary file that came with Scribe 4.1

May be a number code would be preferable. With 0001, it would be claer that it is an
artificial word and it is a primary dictionary Or may be VDF0001 ?

Now two other suggestions.

#1: I would like to have a capitalized version of my name in the dictionary, like Peter.

#2: I would like to have a help or manual box in the Dialog Input Box starting Scribe.

Thanks for everything,

-peter

Topic:	VEDIT 6.20.2 has updated Scribe spelling checker (29 of 33), Read 17 times
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Thursday, July 21, 2011 03:55 PM

On 7/21/2011 12:40:11 PM, peter rejto wrote:
>On 5/27/2011 9:18:13 AM, Scott Lambert
>wrote:
>
>I was very glad to learn from the Vedit
>6.2.1 relase notes that the Scribe speed
>has been increased 30 fold.
>Congratulations.
>
>Now, I realize that I have been slowing
>Scribe down. Specifically, my suggestion
>of editing
>a dictionary file in the style of a maro
>file was a bad one. Simply a .vdf file
>is not a .vdm
>file and do not try to treat it that
>way.
>
>
>So, I would like to amend my suggestion:
>First, I would code the dictionary
>files.
>Second, I would a create a Vedit style
>.not file and edit this .not file.
>
>For example, I could code the
>English.vdf file by starting it with the
>code word
>VDFENU. Then in the vdf.not file I
>would explain that this is the original
>English
>US dictionary file that came with Scribe
>4.1
>
>
>May be a number code would be
>preferable. With 0001, it would be claer
>that it is an
>artificial word and it is a primary
>dictionary Or may be VDF0001 ?
>
>
>
>Now two other suggestions.
>
>#1: I would like to have a capitalized
>version of my name in the dictionary,
>like Peter.
>
>#2: I would like to have a help or
>manual box in the Dialog Input Box
>starting Scribe.
>
>
>Thanks for everything,
>
>-peter

Hi Peter,

Will look into your suggestions next week.

Ted should get all the credit for speeding up Scribe, he did all the grunt work. He has basically taken over development of the Scribe that comes with Vedit. I am totally cool about it. A fast editor needs a fast spell check. Scribe is certainly benefitting from his greater experience at writing macro code. It is all win/win.

Scott

Scott

Topic:	VEDIT 6.20.2 has updated Scribe spelling checker (4 of 33), Read 84 times
Conf:	VEDIT Macro Library
From:	Pauli Lindgren
Date:	Sunday, May 29, 2011 08:06 AM

On 5/26/2011 6:24:25 PM, Ted Green wrote:
>
>* The wordchk.vdm macro used
>the pattern matching code "|<"
>to search for words in the
>dictionaries. Since the "word"
>search option was also used,
>the "|<" was unnecessary and
>significantly slowed down the
>search. The search code scans
>the search string for any
>pattern codes and if none are
>found, goes into SIMPLE mode,
>which is much faster. This
>doubled the overall speed.

Ted, does Vedit use simple mode if the search string contains a pattern such as "|010"?
That kind of pattern can be converted into a single character, so simple search mode should be possible.

I noticed that the Scribe .vdf files have been stored as DOS files (CR-LF). Changing them to Unix files (LF only) would make them some 10% smaller, which should improve speed a bit, too.

--
Pauli

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (5 of 33), Read 86 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Sunday, May 29, 2011 09:37 AM

----- Original Message -----
> From: "Pauli Lindgren" ( pauli0212@... ) On 5/26/2011 6:24:25
>
> Ted, does Vedit use simple mode if the search string contains a
> pattern such as "|010"?
> That kind of pattern can be converted into a single character, so
> simple search mode should be possible.

I need to study the code more to be sure, but it appears you might be right - "|010" is converted into a LF char in the search string and the final search string is later scanned for "|" which is then gone. Therefore it should in SIMPLE mode and much faster.

> I noticed that the Scribe .vdf files have been stored as DOS files
> (CR-LF). Changing them to Unix files (LF only) would make them some
> 10% smaller, which should improve speed a bit, too.

Good and simple suggestion!

There are several more places speed improvements could be made.

1. In wordchk.vdm under "check for word in the special dictionaries in buffer #75" it appears that each special dictionary is opened for EACH word checked. Similar to how the user.vdf, ignore.vdf and english.vdf are kept open, all the special dictionaries should be kept open too.

2. I have considered opening the english.vdf dictionary into 26 buffers, one for each first letter. I have no idea if this would speed things up much or not.

Scott had the very smart idea of looking up each different word only once in the main/extra dictionaries. Each word is then added to the "ignore" buffer for quicker lookup. However I do notice that the spell checking seems to slow down near the end of the file, perhaps because the "ignore" buffer is getting larger. So perhaps 26 ignore buffers - one for each first letter - would help.

Also adding perhaps the most common 100 words to a special Common.vdf dictionary and scanning it first, before the "ignore" buffer might speed things up too.

Ted.

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (7 of 33), Read 81 times, 1 File Attachment
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Sunday, May 29, 2011 12:23 PM

"Also adding perhaps the most common 100 words to a special Common.vdf dictionary and scanning it first, before the "ignore" buffer might speed things up too."

I already may have the VDF file you are looking for. Please see base.vdf attached.

Scott

BASE.VDF (6KB)

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (12 of 33), Read 78 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Sunday, May 29, 2011 11:37 PM

----- Original Message -----
> From: "Scott Lambert" ( scottmacros@... )
> "Also adding perhaps the most common 100 words to a special Common.vdf
> dictionary and scanning it first, before the "ignore" buffer might
> speed things up too."
>
> I already may have the VDF file you are looking for. Please see
> base.vdf attached.

Good. I noone else get around to it, I will change the wordchk.vdm code on Tuesday to use it first and see if that helps the speed on moderate sized files.

Ted.

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (8 of 33), Read 83 times
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Sunday, May 29, 2011 12:34 PM

"2. I have considered opening the english.vdf dictionary into 26 buffers, one for each first letter. I have no idea if this would speed things up much or not."

What is the "in memory" buffer size? Or put another way, what is the largest file Vedit can edit without auto-buffering?

My thinking is instead of one file for each letter, we could have letter groups (English-ac.vdf English-df,etc), each file being less then Vedit's "in memory" buffer size.

Comments?

Scott

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (9 of 33), Read 83 times
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Sunday, May 29, 2011 01:39 PM

"My thinking is instead of one file for each letter, we could have letter groups (English-ac.vdf English-df,etc), each file being less then Vedit's "in memory" buffer size."

Another idea, is say the current word just searched for began with a "t", the file_pos in english.vdf would be quite high being somewhere in the T's. Now if the next word began with a "h", Scribe should do a reverse search as obviously searching forward would be a waste of time. If the next word was higher then "H", scribe would seach forward. If lower then "H", search reverse.

Right now Scribe starts every search from the start of the file. This would work with the english.vdf and any other vdf (special dictionaries) that was sorted by alphabet. Would not work with the user.vdf or ignore.vdf.

Scott

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (10 of 33), Read 80 times
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Sunday, May 29, 2011 02:39 PM

Ted, in the original message to this thread, you said:

"* The wordchk.vdm macro immediately returns success for any 1-character word"

Why not increase it to 3 or 4 letters? How often does one misspell a 3 or 4 letter word??? Many of the words we use are under 5 letters in length.

Maybe Scribe needs two types of spell check, A quick check which ignores words of 4 or less letters, and a full check that only ignores words one letter in length?

Scott

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (15 of 33), Read 80 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Sunday, May 29, 2011 11:46 PM

----- Original Message -----
> From: "Scott Lambert" ( scottmacros@... )
> Ted, in the original message to this thread, you said:
>
> "* The wordchk.vdm macro immediately returns success for any
> 1-character word"
>
> Why not increase it to 3 or 4 letters? How often does one misspell a 3
> or 4 letter word??? Many of the words we use are under 5 letters in
> length.
>
> Maybe Scribe needs two types of spell check, A quick check which
> ignores words of 4 or less letters, and a full check that only ignores
> words one letter in length?

I do think 2-letter words should be checked.
How about changing the main dictionary so that 2-letter words come first, then 3-letter, etc.

With all the improvements you have suggested, we should be able to improve the speed even more.

I am planning to build my own "balanced binary tree" routines into VEDIT. This would let you load millions of items into the tree and perform a near instant search on it. Hopefully later this year.

Ted.

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (17 of 33), Read 65 times
Conf:	VEDIT Macro Library
From:	Pauli Lindgren
Date:	Monday, May 30, 2011 11:03 AM

On 5/29/2011 11:46:18 PM, Ted Green wrote:
>----- Original Message -----
>
>I am planning to build my own
>"balanced binary tree"
>routines into VEDIT. This
>would let you load millions of
>items into the tree and
>perform a near instant search
>on it. Hopefully later this
>year.

Ted,

I don't know how well balanced binary tree would work with text files. You would need to create an internal structure, since the text file itself can not be a b-tree.

You earlier said something about implementing binary search in Vedit. That would be more general purpose I think.

I have done some experiments with binary search for utags.vdm.
Even when b-search is implemented in macro language, it does give significant speed improvement for larger tags files.

I did some tests with my 28MB tags file (in fast browse mode).
When searching for a symbol which is found near end of the file, I measured the following times:

Search forward: 3.8 sec
Search backward: 6.9 sec (for symbol near BOF)
Binary search: <0.5 sec

However, I have not implemented it in utags.vdm yet since there are some problems. It appears that the sort order of Sort_Merge() command (with or without collate) is not the same as that of Compare() and Match() commands (at least character '_' does not sort correctly). So I probably need to create a special collate table.

--
Pauli

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (23 of 33), Read 48 times
Conf:	VEDIT Macro Library
From:	Pauli Lindgren
Date:	Friday, June 03, 2011 07:21 AM

On 5/30/2011 11:03:25 AM, Pauli Lindgren wrote:
>
> I did some tests with my 28MB tags file (in fast browse mode).
> When searching for a symbol which is found near end of
> the file,I measured the following times:
>
> Search forward: 3.8 sec
> Search backward: 6.9 sec (for symbol near BOF)
> Binary search: <0.5 sec

Those times were actually not very representative.
They were tested on a slow network drive, mapped to a ClearCase view-server (which is very slow).

When the 28MB tags file was copied to local hard drive, the times were:

Search forward: 0.5 sec
Search backward: 2.8 sec
Binary search: 0 sec

>
> However, I have not implemented it in
> utags.vdm yet since there are some
> problems. It appears that the sort order
> of Sort_Merge() command (with or without
> collate) is not the same as that of
> Compare() and Match() commands (at least
> character '_' does not sort correctly).
> So I probably need to create a special
> collate table.

I remembered this wrong.
The sort order is correct when Sort_Merge() is called with NOCOLLATE option and Compare() or Match() is called with CASE option.

(I still have some problems for example with handling of multiple occurrences.)

BTW, it seems that CASE option in Sort_Merge has no effect if NOCOLLATE option is given.

--
Pauli

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (21 of 33), Read 62 times
Conf:	VEDIT Macro Library
From:	Ian Binnie
Date:	Monday, May 30, 2011 08:59 PM

On 5/29/2011 11:46:18 PM, Ted Green wrote:
>I am planning to build my own
>"balanced binary tree"
>routines into VEDIT. This
>would let you load millions of
>items into the tree and
>perform a near instant search
>on it. Hopefully later this
>year.

Binary trees are fast for searching, but building them can be slow, particularly when the input data is already sorted. There are also significant overheads - I am not sure how this would work in vedit which doesn't have pointers (although I guess you would do this in assembler).

For a dictionary lookup I would recommend binary search.
The number of search steps is not much larger, and for medium lists (hundreds of thousand words) nearly as efficient.

I have implemented binary search in lots of different languages (even Excel VBA).

For something like a dictionary lookup (with variable length keys) you don't even need to count entries, a quick and dirty jump using file size is good enough.

In practice, where you know the approximate number of entries, a fixed number of binary steps, then linear search can be quite efficient.

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (20 of 33), Read 64 times
Conf:	VEDIT Macro Library
From:	Howard Goldstein
Date:	Monday, May 30, 2011 07:18 PM

On 5/29/2011 2:39:56 PM, Scott Lambert wrote:
>
>Ted, in the original message
>to this thread, you said:
>
>"* The wordchk.vdm macro
>immediately returns success
>for any 1-character word"
>
>Why not increase it to 3 or 4
>letters? How often does one
>misspell a 3 or 4 letter
>word??? Many of the words we
>use are under 5 letters in
>length.
>
I think it's absolutely essential to check short words. I can't tell you how often I've seen or typed "teh" instead of "the" for example.

-- Howard

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (22 of 33), Read 65 times
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Tuesday, May 31, 2011 10:51 AM

On 5/30/2011 7:18:24 PM, Howard Goldstein wrote:
>On 5/29/2011 2:39:56 PM, Scott Lambert
>wrote:
>>
>>Ted, in the original message
>>to this thread, you said:
>>
>>"* The wordchk.vdm macro
>>immediately returns success
>>for any 1-character word"
>>
>>Why not increase it to 3 or 4
>>letters? How often does one
>>misspell a 3 or 4 letter
>>word??? Many of the words we
>>use are under 5 letters in
>>length.
>>
>I think it's absolutely essential to
>check short words. I can't tell you how
>often I've seen or typed "teh" instead
>of "the" for example.
>
>-- Howard

No problem, just thinking out loud. Scott

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (14 of 33), Read 83 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Sunday, May 29, 2011 11:41 PM

----- Original Message -----
> From: "Scott Lambert" ( scottmacros@... )
> "My thinking is instead of one file for each letter, we could have
> letter groups (English-ac.vdf English-df,etc), each file being less
> then Vedit's "in memory" buffer size."
>
> Another idea, is say the current word just searched for began with a
> "t", the file_pos in english.vdf would be quite high being somewhere
> in the T's. Now if the next word began with a "h", Scribe should do a
> reverse search as obviously searching forward would be a waste of
> time. If the next word was higher then "H", scribe would seach
> forward. If lower then "H", search reverse.

Reverse searches are much slower; therefore the logic should be either a forward search if the next word is "greater" or start from the beginning if the next word is "lesser".

Ted.

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (16 of 33), Read 67 times
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Monday, May 30, 2011 10:54 AM

On 5/29/2011 11:41:05 PM, Ted Green wrote:
>----- Original Message -----
>Reverse searches are much
>slower; therefore the logic
>should be either a forward
>search if the next word is
>"greater" or start from the
>beginning if the next word is
>"lesser".

Another idea is that since english.vdf never changes, one could have a lookup table that has the letter and the starting line# of the words that start with that letter.

Looking something like:

a:1
b:6542
c:12822

and so on.

The assumption is that goto_line works fast regardless of file size (or at least in the range we are talking about), and then Scribe begins its search much nearer to where the word it is looking for is probably located.

The obvious name for the table file would be english.tbl.
This way, if there is ever French or German vdf files, there can be french.tbl & german.tbl files to go with the main dictionary. (Yeah, I know, they will abolish income taxes first...)

Scott

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (18 of 33), Read 74 times
Conf:	VEDIT Macro Library
From:	Pauli Lindgren
Date:	Monday, May 30, 2011 11:22 AM

On 5/30/2011 10:54:26 AM, Scott Lambert wrote:
>
>Another idea is that since english.vdf
>never changes, one could have a lookup
>table that has the letter and the
>starting line# of the words that start
>with that letter.
>
>Looking something like:
>
>a:1
>b:6542
>c:12822
>
>and so on.
>
>The assumption is that goto_line works
>fast regardless of file size (or at
>least in the range we are talking
>about), and then Scribe begins its
>search much nearer to where the word it
>is looking for is probably located.
>
>The obvious name for the table file
>would be english.tbl.
>This way, if there is ever French or
>German vdf files, there can be
>french.tbl & german.tbl files to go with
>the main dictionary. (Yeah, I know, they
>will abolish income taxes first...)

Or you could just insert the table at the begining of vdf file.
Or you could create the table in memory when starting Scribe.

However, line number may not work.
If you are using Fast Browse Mode, and the file is large enough, Goto_Line() jumps to any random location (or to the end of file) no matter which parameter you give to it.
I have run into this problem many times for example when using Compiler Support (and the files are read-only since they are checked in).

It is better to use Goto_Pos(). It is faster, too.

However, splitting the dictionary into multiple buffers would be even faster. You do not need to search the whole dictionary in case the word is not found. But of course that would consume a lot of buffers.

--
Pauli

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (19 of 33), Read 74 times
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Monday, May 30, 2011 04:36 PM

On 5/30/2011 11:22:08 AM, Pauli Lindgren wrote:
>On 5/30/2011 10:54:26 AM, Scott Lambert
>wrote:
>>
>
>Or you could just insert the table at
>the begining of vdf file.
>Or you could create the table in memory
>when starting Scribe.

Yes both options would work. However since english.vdf is a static file, seems a waste to create table each time.

>It is better to use Goto_Pos(). It is
>faster, too.

Yes, the idea would also work using file position instead of line numbers.

>
>However, splitting the dictionary into
>multiple buffers would be even faster.
>You do not need to search the whole
>dictionary in case the word is not
>found. But of course that would consume
>a lot of buffers.

Would each subset dictionary require its own buffer? Could one not re-use the same buffer. Vedit, I imagine, is very fast openning files, even huge files. And with modern disk caching, I don't see multiple open/closes downgrading performance at least at the level a human would percieve.

Scott

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (28 of 33), Read 20 times
Conf:	VEDIT Macro Library
From:	Peter Rejto
Date:	Thursday, July 21, 2011 03:13 PM

On 5/30/2011 11:22:08 AM, Pauli Lindgren wrote:
>On 5/30/2011 10:54:26 AM, Scott Lambert
>wrote:
>>
>>This way, if there is ever French or
>>German vdf files, there can be
>>french.tbl & german.tbl files to go with
>>the main dictionary. (Yeah, I know, they
>>will abolish income taxes first...)
>Pauli
>

Hi Pauli,

Somehow I missed this message of yours.

So, I went ahead on my own and tried to experiment with a Hungarian dictionary, that I downloaded from the Winedt.org website. Then, I got hung up on some technical difficulties.
The style of this dictionary was similar to the one of Scott: Uncompressed and one line per word. The only deviation that I could see from the .vdf style was that it had a couple of high ASCII characters.

At the same time, I learned quit a bit about foreign dictionaries. In fact, I am convinced that you could install the WiEdt French dictionary into Scribe in five minutes.

Here is the way I see it: The key is in the .pdf file of the person who maintains the French dictionary. This is a .pdf file and I followed another suggestion of yours, saying that I should save graphics as a .JPG file. I partially succeeded. That is to say I could say the relevant page, although all that I needed was the ASCII table. Than I also succeeded in converting it back to a .PDF file. I shall try to upload it

Since Vedit does have a {Misc, ASCII table} menu command, I tried to look up the characters mentioned in the table, I call it the Pascal Table. The first character in the Pascal Table was an accented u. So I was close. However the second character was just "another funny face". In short, I just do not know from which alphabet does the Pascal Table come from. ANSI, OEM, possibly one of the Latex alphabets ?

(I had fun using the Cur_Char() function.)

Thanks for everything,

-peter

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (30 of 33), Read 22 times
Conf:	VEDIT Macro Library
From:	Peter Rejto
Date:	Thursday, July 21, 2011 05:00 PM

On 7/21/2011 3:13:18 PM, peter rejto wrote:
>On 5/30/2011 11:22:08 AM, Pauli Lindgren
>wrote:
>>On 5/30/2011 10:54:26 AM, Scott Lambert
>>wrote:
>>>
>>>This way, if there is ever French or
>>>German vdf files, there can be
>>>french.tbl & german.tbl files to go with
>>>the main dictionary. (Yeah, I know, they
>>>will abolish income taxes first...)
>>Pauli
>>
>
>Hi Pauli,
>
>Somehow I missed this message of yours.
>
>So, I went ahead on my own and tried to
>experiment with a Hungarian dictionary,
>that I downloaded from the Winedt.org
>website. Then, I got hung up on some
>technical difficulties.
>The style of this dictionary was similar
>to the one of Scott: un compressed and
>one line per word. The only deviation
>that I could see from the .vdf style was
>that it had a couple of high ASCII
>characters.
>
>
>At the same time, I learned quit a bit
>about foreign dictionaries. In fact, I
>am convinced that you could install the
>WiEdt French dictionary into Scribe in
>five minutes.
>
>
>Here is the way I see it: The key is in
>the .pdf file of the person who
>maintains the French dictionary. This is
>a .pdf file and I followed another
>suggestion of yours, saying that I
>should save graphics as a .JPG file. I
>partially succeeded. That is to say I
>could say the relevant page, although
>all that I needed was the ASCII table.
>Than I also succeeded in converting it
>back to a .PDF file. I shall try to
>upload it
>
>Since Vedit does have a {Misc, ASCII
>table} menu command, I tried to look up
>the characters mentioned in the table, I
>call it the Pascal Table. The first
>character in the Pascal Table was an
>accented u. So I was close. However the
>second character was just "another funny
>face". In short, I just do not know
>from which alphabet does the Pascal
>Table come from. ANSI, OEM, possibly one
>of the Latex alphabets ?
>
>(I had fun using the Cur_Char()
>function.)
>
>Thanks for everything,
>
>-peter

Hi again,

I did not succeed in uploading my attachment because it was too big.

In the meantime I did some more experiments and I would like to report on them.

Since these were meant to be private experiments I did it with the Hungarian dictionary. Before describing the experiments, let me say that I have a hunch that my Vedit installation is not quite right inasmuch as:

Yesterday, when I opened up the Hungarian dictionary in Vedit the accents looked garbled. Today they looked like a charm.

Anyway here are the details.

1.: Following the style of the new Vedit 6.21.1 directory structure I have created a new Originals sub directory of the Scribe directory. Then, I copied all .vdf files into this directory.

2.: I have renamed the WienEdt Hungarian dictionary to english.vdf

3.: I have copied a block of words from english.vdf to a new Vedit buffer.

4.: Surprise ! The words displayed differently in the new buffer. Specifically, instead of accented letters I have seen blocks. (I have checked some blocks with the Vedit Cur_Char() command. I believe, for one block I got 225 and for another 110.)

5.: I did run Scribe from the {Misc, Scribe,spell check} Vedit menu command.

6.: The usual Scribe Dialog Input box appeared and I have chosen the first option; Spell check current buffer, I believe.

7.: Oops, I forgot to tell you that I did put the cursor in the buffer to the beginning.

8.: The cursor did move> However it did not move to the end. It was 2 lines short.

I certainly would appreciate a Dashboard enhancement that would tell me whether I am using ANSI or OEM fonts. I think that this is a very subtle issue.

Thanks or everything,

-peter

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (31 of 33), Read 21 times
Conf:	VEDIT Macro Library
From:	Pauli Lindgren
Date:	Saturday, July 23, 2011 09:40 AM

On 7/21/2011 5:00:47 PM, peter rejto wrote:
>
>I certainly would appreciate a Dashboard
>enhancement that would tell me whether I
>am using ANSI or OEM fonts. I think that
>this is a very subtle issue.

I have no idea what you are trying to do.

However, you do not need Dashboard to find out whether you are using ANSI or OEM font.
Just select "View" -> "Font", and the dialog box displays the font you are currently using.
If the font name does not reveal this information, look at the "Script" box.
"Western" means it is ANSI font.
"OEM/DOS" means it is OEM font.
There are other possibilities, too, such as "Greek" or "Symbol".

--
Pauli

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (32 of 33), Read 24 times
Conf:	VEDIT Macro Library
From:	Peter Rejto
Date:	Saturday, July 23, 2011 12:21 PM

On 7/23/2011 9:40:34 AM, Pauli Lindgren wrote:
>On 7/21/2011 5:00:47 PM, peter rejto
>wrote:
>>
>>I certainly would appreciate a Dashboard
>>enhancement that would tell me whether I
>>am using ANSI or OEM fonts. I think that
>>this is a very subtle issue.
>
>I have no idea what you are trying to
>do.
>

>Pauli

Hi,

Let me give you a progress report:

In order to properly display the accents in, say, Hungarian, I need ANSI fonts. So, I wanted to automate this display procedure. After studying the manual I wrote the following macro:

#1=Font_Charset
if (#1==0) { Statline_Message("ANSI") }
if (#1==255) { Statline_Message("OEM") }
Return

So, I added this macro to my Dashboard. It seems to work.
That is to say, the Statline tells me the name of the installed character set.

Now, I would like to do this via the new command,

Read_Ini(r,"section","parameter","filename").

I tried, but I am not a good typist, etc. In short, I have a hunch that I got hung up on a simple technicality. So, if somebody would help me out, I would really appreciate it.

Of course, my hope is that if I can read the parameters from my Vedit.ini file than I can also write to it via the new "write" command.
Furthermore, I hope that
Charset=OEM,
is indeed a parameter, so I can set it.

The manual emphasizes that the value of the charset has to be set manually, via the {View, Font} menu command.
(So, with some luck, this is exactly the enhancement I need.)

A big thank you for the other half of your message:

>However, you do not need Dashboard to
>find out whether you are using ANSI or
>OEM font.
>Just select "View" -> "Font", and the
>dialog box displays the font you are
>currently using.
>If the font name does not reveal this
>information, look at the "Script" box.
>"Western" means it is ANSI font.
>"OEM/DOS" means it is OEM font.
>There are other possibilities, too, such
>as "Greek" or "Symbol".
>
>--

I tried but could not figure out that the abbreviation in the Vedit box, referred to Western.
(And of course, I would have never been able to figure out that
"Western" means it is ANSI font.)
I also would like to know how Vedit handles the "Greek" or "Symbol" issue. I tried to find it in the Vedit box, but could not.

Thanks as always,

-peter

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (13 of 33), Read 83 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Sunday, May 29, 2011 11:39 PM

----- Original Message -----
> From: "Scott Lambert" ( scottmacros@... )
> "2. I have considered opening the english.vdf dictionary into 26
> buffers, one for each first letter. I have no idea if this would speed
> things up much or not."
>
> What is the "in memory" buffer size? Or put another way, what is the
> largest file Vedit can edit without auto-buffering?
>
> My thinking is instead of one file for each letter, we could have
> letter groups (English-ac.vdf English-df,etc), each file being less
> then Vedit's "in memory" buffer size.

The default "in memory" size is 128K, but with the new Buf_Switch option you can create buffers of up to 4 megs.

Buf_Switch() The new Buf_Switch(r,VALUE,size) option sets the
allocated memory size of a new buffer to 'size' instead
of the default (which is currently 128 Kbytes). The
allowable range of values is 16K - 4Meg.

In the future I will likely make the default size configurable in some way.

Ted.

Topic:	VEDIT 6.20.2 has updated Scribe spelling checker (6 of 33), Read 85 times
Conf:	VEDIT Macro Library
From:	Scott Lambert
Date:	Sunday, May 29, 2011 12:20 PM

"I noticed that the Scribe .vdf files have been stored as DOS files (CR-LF). Changing them to Unix files (LF only) would make them some 10% smaller, which should improve speed a bit, too."

Would that make it difficult or confusing for the aveage user to edit the vdf files, such as accidently adding an incorrect word?

I only seem to edit DOS files, so have no experience with Unix files.

Scott

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (11 of 33), Read 89 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Sunday, May 29, 2011 11:35 PM

----- Original Message -----
> From: "Scott Lambert" ( scottmacros@... )
> "I noticed that the Scribe .vdf files have been stored as DOS files
> (CR-LF). Changing them to Unix files (LF only) would make them some
> 10% smaller, which should improve speed a bit, too."
>
> Would that make it difficult or confusing for the aveage user to edit
> the vdf files, such as accidently adding an incorrect word?
>
> I only seem to edit DOS files, so have no experience with Unix files.

Since VEDIT auto-detect UNIX files, it should make no difference.

Ted.

Topic:	VEDIT 6.20.2 has updated Scribe spelling checker (24 of 33), Read 35 times, 1 File Attachment
Conf:	VEDIT Macro Library
From:	Pauli Lindgren
Date:	Sunday, June 05, 2011 03:38 PM

I found a bug in Scribe that causes significant slow down when checking large files.

I made some speed tests with the latest Scribe.
I have a 130 KB text file that does not have any spelling errors.
Checking the file with Scribe took 6.9 seconds.
After the check, Scibe goes to main menu again. So I pressed Enter to run the check again.
This time the checking took 2 min 14 sec.
That is about 20 times longer than on the first round!

Then I made a bigger file by copying the contents of the file twice into a new file, resulting a 260 KB file.
When checking this file, the speed of checking slows down dramatically somewhere after the middle of the file, and the total checking time is around 2 min 20 sec.

The reason for the slow-down was the code in spelling.vdm that adds a word to 'ignore this session' buffer if it was found in any dictionary.

The problem is that the word is added even if it was found in 'ignore this session' buffer. So Scribe keeps adding the same words again and again. When the size of the buffer grows above 120 KB, Vedit starts buffering to disk and the execution slows down dramatically.

With my test file, there were already 18441 words added after first round, and the buffer size was 119629. After the second round, those values were doubled.

The fix to this problem was quite simple. I changed the code so that a word is added to 'ignore this session' buffer only if it was found from main dictionary

There are changes in two files.
- In wordchk.vdm, the "found" flag (#62) is set to different value depending on from which dictionary the word was found.
- In spelling.vdm, the word is added to the list only if the dictionary number is >4 (main dictionary or special dictionary).

After this change, the checking on my test file took 6.5 seconds on the first round, and 1.7 seconds on the second round. Only 1132 words were added to the ignore list. Checking the 260 KB file took 8.3 seconds.

The enclosed zip file contains the changed files.

--
Pauli

SCRIBE_FIX.ZIP (1KB)

Topic:	Re: VEDIT 6.20.2 has updated Scribe spelling checker (26 of 33), Read 47 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Sunday, June 05, 2011 05:45 PM

----- Original Message -----
> From: "Pauli Lindgren" ( pauli0212@... ) I found a bug in Scribe
> that causes significant slow down when checking large files.

Pauli:

Sorry, but I have corrected that and have not posted the new version yet as Scott and I are finalizing it.

Ted.

Topic:	Scribe problems / suggestions (25 of 33), Read 38 times
Conf:	VEDIT Macro Library
From:	Pauli Lindgren
Date:	Sunday, June 05, 2011 03:57 PM

A few things that I noticed when testing Scribe.

- If you use "Edit" option to replace word, and the new word is longer than the original, Scribe continues checking from middle of the word and then (usually) marks the end part of the word as an error.

- When a word is fixed and you select "Replace All", then Scribe replaces the given string even if it is a part of a longer word.
For example: if the word to fix is "repea" and you change that to "repeat", then any following words "repeat" will be changed to "repeatt".

- When editing the word in the dialog box, it would help if the old word was filled in the dialog by default. This way it would be easier to fix some single character errors etc.
Or you might consider using Visual() command.

- When spelling check is canceled (e.g. in order to manually fix something), Scribe jumps to the begin of file. Therefore it is difficult to find the position where the manual fixing is needed.
I think the cursor should be left where it was when the checking was canceled.

- If there is a string such as 'r', Scribe marks the last two characters (r') as an error. Why does it not skip the ' character?

--
Pauli

Topic:	Scribe problems / suggestions (33 of 33), Read 6 times
Conf:	VEDIT Macro Library
From:	Peter Rejto
Date:	Thursday, July 28, 2011 02:26 AM

On 6/5/2011 3:57:01 PM, Pauli Lindgren wrote:
>A few things that I noticed
>when testing Scribe.
>
>- If you use "Edit" option to
>replace word, and the new word
>is longer than the original,
>Scribe continues checking from
>middle of the word and then
>(usually) marks the end part
>of the word as an error.
>
>- When a word is fixed and you
>select "Replace All", then
>Scribe replaces the given
>string even if it is a part of
>a longer word.
>For example: if the word to
>fix is "repea" and you change
>that to "repeat", then any
>following words "repeat" will
>be changed to "repeatt".
>
>- When editing the word in the
>dialog box, it would help if
>the old word was filled in the
>dialog by default. This way it
>would be easier to fix some
>single character errors etc.
>Or you might consider using
>Visual() command.
>
>- When spelling check is
>canceled (e.g. in order to
>manually fix something),
>Scribe jumps to the begin of
>file. Therefore it is
>difficult to find the position
>where the manual fixing is
>needed.
>I think the cursor should be
>left where it was when the
>checking was canceled.
>
>- If there is a string such as
>'r', Scribe marks the last two
>characters (r') as an error.
>Why does it not skip the '
>character?
>
>--
>Pauli

Hi,

I would like to add some other suggestions:

1.: I would like to have a TEX option.

Incidentally, could I save the TeX words that scribe finds into the tex.vdf file directly ?

2.: If scribe has no suggestions, I would like to be able to open the dictionary file and copy a word into the dialog box. Possibly I would edit the word in the dialog box.

3.: Actually the scribe suggestion list, in general, is quite generous. I would like to be able to just highlight my choice and let scribe do the entering for me.

Now a progress report; I had trouble with the replace all command.

thanks as always,

-peter