Topic: Sorting CSV files (1 of 14), Read 153 times
Conf: Hints, tips, shortcuts
From: Bob Jensen
Date: Monday, October 20, 2003 01:42 AM

Hi Ted!

As a 10 year Vedit user, I finally bought a new copy of 6.111 to get the sorting feature.

When I export SQL tables as CSV files, they don't always come out with each record in sequential order according to the primary index field. Other tools can't handle large files for sorting the records according to the index. I'm sure Vedit can since it can do most anything else, but I don't know how.

The index field in each record is numbered consecutively and unpadded with zeros, e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 etc., and that's how I want the sort to end up after the table is jumbled in the export process.

How do I get Vedit to sort the file the way I want instead of 1, 10, 11, 12, 13, 2, 3, 4, etc.? IOW, I want the lines to be sorted according to the numeric value of the entire delimited field instead of the default way Vedit does it.

Thanks!

 


Topic: Sorting CSV files (2 of 14), Read 150 times, 1 File Attachment
Conf: Hints, tips, shortcuts
From: Christian Ziemski
Date: Monday, October 20, 2003 03:47 AM


Bob:

That way of sorting isn't VEDIT's special way but the usual way of sorting text (and not numbers).

So a possible solution is to format the numbers in a way that text sorting would work with it.

I wrote a little macro to do that (it's attached):

The macro changes a list like

4,willi
2,karl
1,otto
738,tom
12,peter
10,peter

to

0000000004,willi
0000000002,karl
0000000001,otto
0000000738,tom
0000000012,peter
0000000010,peter


In that format a sort will work. You can import the result directly into Excel or so since the leading zeros are ignored.

Or you can remove the leading zeros again with a
Replace("|<0", "", BEGIN+ALL+NOERR)

Christian

 
NUMCOLPAD.VDM (1KB)

 


Topic: Sorting CSV files (3 of 14), Read 152 times
Conf: Hints, tips, shortcuts
From: Bob Jensen
Date: Monday, October 20, 2003 10:03 AM

Thanks, Christian.

That's the "padding" solution, an analog of which I end up using way too often to sort numeric data. I'd hoped that Vedit had the ability to distinguish numeric from text data like SQLServer and Excel can.

Excel is a mess along the sorting lines too because it sorts numeric data as text even though it does distinguish the two types. Since Excel can only handle 2^16 rows, it simply cannot handle large files of unsorted data.

In addition to the many steps needed to pad numeric columns with 0s, steps are needed to remove the 0s after sorting. A "number" (or numeric data) always begins with a significant digit, and importing padded primary index data into a SQLServer table might be a bit gamey.

Theoretically, distinguishing delimited numeric data fields for sorting shouldn't be hard, and Vedit should be able to do it!

Thanks again!

 


Topic: Re: Sorting CSV files (4 of 14), Read 153 times
Conf: Hints, tips, shortcuts
From: Christian Ziemski
Date: Monday, October 20, 2003 11:36 AM

Bob:

>I'd hoped that Vedit had the ability to distinguish
>numeric from text data like SQLServer and Excel can.

VEDIT can. In the macro language. But not in general.
Don't forget: VEDIT is a *text* editor and not a spreadsheet!

>In addition to the many steps needed to pad numeric columns with 0s, steps are
>needed to remove the 0s after sorting. A "number" (or numeric data) always
>begins with a significant digit, and importing padded primary index data into a
>SQLServer table might be a bit gamey.

A macro like the one from my last posting with two lines added:
- Sort_Merge(...) // sort the entire file
- Replace(...) // Remove the padding again
is too much trouble? That is a 10-liner.

>Theoretically, distinguishing delimited numeric data fields for sorting
>shouldn't be hard, and Vedit should be able to do it!

It is hard. For a text editor.

But let's wait for Ted's opinion. ;-)


Christian

 


Topic: Re: Sorting CSV files (5 of 14), Read 158 times
Conf: Hints, tips, shortcuts
From: Bob Jensen
Date: Monday, October 20, 2003 05:23 PM

Calling Vedit a "text" editor is like saying Bill Gates has an "adequate" income. Vedit is a "byte" processor in that it can do anything with bytes whether the author intended for them to be construed as text, graphics, numbers, or whatever.

I can't really imagine programming without Vedit in the toolbox, but a big opportunity may be wasted if Vedit doesn't have the ability to sort numeric data columns without padding the data first.

Sorting for the convenience of a computer is nice, but I've never met anybody except programmers who find that useful.

Computers and people number sequential data with the first significant byte being a digit. For example, if you're counting the pebbles in a field with only one pass allowed through the data, you'd have to know what the total was before you started so you'd pad the first numbers with enough zeros. The problem comes about when you want to resort the data for some other useful purpose. Included in that is my present case of simply wanting to resort the data as it was originally imported into a SQL table.

In our case, we have 2 C++ programmers who need and use Vedit. However, if Vedit could do simple numeric sorts from the menu bar and resulting dialog, we have many more people who would use and need a copy. I've written thousands of lines of Vedit macros, but I wouldn't dream of suggesting that any of those people use macros because of the learning curve, and more importantly, my time teaching them. A "universal" tool with thousands of uses is sold based on one or two perceived uses. I'm all for Vedit becoming ubiquitous just as I have been since I discovered it 10 years ago.

If Vedit doesn't have the present ability to sort numeric data from the sort dialog, I hope Ted will make it happen as he has with so many other indispensable features in the past.

 


Topic: Re: Sorting CSV files (7 of 14), Read 161 times
Conf: Hints, tips, shortcuts
From: Christian Ziemski
Date: Tuesday, October 21, 2003 03:26 AM

Bob:

On Mon, 20 Oct 2003 17:23:00 -0400, you wrote:

>Calling Vedit a "text" editor is like saying Bill Gates has an "adequate"
>income.

Nice comparison! ;-)

>Vedit is a "byte" processor in that it can do anything with bytes
>whether the author intended for them to be construed as text, graphics,
>numbers, or whatever.

Exactly. On the basis of *bytes*. I simply have set byte=character.

>I can't really imagine programming without Vedit in the toolbox,

Mee too. What do you think why I'm posting so often here?
VEDIT is not just enormous useful but it is fun to solve problems with
it.

>In our case, we have 2 C++ programmers who need and use Vedit. However, if
>Vedit could do simple numeric sorts from the menu bar and resulting dialog, we
>have many more people who would use and need a copy. I've written thousands of
>lines of Vedit macros, but I wouldn't dream of suggesting that any of those
>people use macros because of the learning curve, and more importantly, my time
>teaching them.

What about a self written macro NumSort.VDM included in the {User} or
{Tools} menu in all installations of VEDIT?
Using Dialog_Input_1() could be used to make very flexible dialogs.
No user would realize that he is using a macro.

I have to admit that that is not build in (yet) but could be a
workaround.

Or better: you can modify SORTMERG.VDM which implements
{Block, Edit/translate, Sort lines} to have an additional
checkbox/button "treat first data column as numerical" and so doing
the padding and removing of the padding automatically and invisibly.

I have modified SORTMERG.VDM myself to have more functionality.
There is now a forth button for "get the sort fields automatically".

O.k. modifying standard macros (by copying them to user-mac of
course!) has the possible disadvantage that you have to pay attention
when getting VEDIT updates.


>If Vedit doesn't have the present ability to sort numeric data from the sort
>dialog, I hope Ted will make it happen as he has with so many other
>indispensable features in the past.

Yes, it is fascinating to see what Tom + Ted have done in the last
decades! (I'm using VEDIT for 11 years now.)

Christian

 


Topic: AW: Sorting CSV files (6 of 14), Read 155 times
Conf: Hints, tips, shortcuts
From: Michael Baas
Date: Tuesday, October 21, 2003 02:37 AM

Hi,

I just could not let this pass along without throwing my 0.02$ in ;-)

> A macro like the one from my last posting with two lines added:
> - Sort_Merge(...) // sort the entire file
> - Replace(...) // Remove the padding again
> is too much trouble? That is a 10-liner.

This approach is unuseable when your numeric values contain "desired"
leading 0s. Then you just can't pad 0s and remove all zeroes afterwards.
And yes, this comes from a real-life application: a customer has
product-codes, which sometimes have leading zeros and sometimes not. And
still they want the codes sorted in numerical order. (That's not a
problem in their case, but it would be for VEDIT)

> It is hard. For a text editor.

But like Bob said, VEDIT is much more - isn't it the swiss-army-knife
for files?
And: have you ever checked out www.sortfile.com? :-)

Cheers

Michael

 


Topic: Re: Sorting CSV files (8 of 14), Read 154 times
Conf: Hints, tips, shortcuts
From: Christian Ziemski
Date: Tuesday, October 21, 2003 03:38 AM

Michael:

> I just could not let this pass along without throwing my 0.02$ in ;-)
>
>> A macro like the one from my last posting with two lines added:
>> - Sort_Merge(...) // sort the entire file
>> - Replace(...) // Remove the padding again
>> is too much trouble? That is a 10-liner.
>
>This approach is unuseable when your numeric values contain "desired"
>leading 0s. Then you just can't pad 0s and remove all zeroes afterwards.

There are always exceptions.

I'm not a mathematician, I'm simply trying to find solutions (often
named work-arounds) ;-)

In those cases the padding-macro has to be more intelligent...
We have just to use another blade of the swiss-army-knive.

>And yes, this comes from a real-life application: a customer has
>product-codes, which sometimes have leading zeros and sometimes not.

Wouldn't be the product codes of the same length then???

Anyway. It should be solvable.

>But like Bob said, VEDIT is much more - isn't it the swiss-army-knife
>for files?

Yes, that is known worldwide!

>And: have you ever checked out www.sortfile.com? :-)

:-O I didn't know that one!


Christian

 


Topic: AW: Sorting CSV files (9 of 14), Read 156 times
Conf: Hints, tips, shortcuts
From: Michael Baas
Date: Tuesday, October 21, 2003 03:53 AM

Christian,

> I'm not a mathematician, I'm simply trying to find solutions (often
named work-arounds) ;-)

Neither am I :-)

> In those cases the padding-macro has to be more intelligent...
> We have just to use another blade of the swiss-army-knive.

Well, no doubt it can be done. But I guess it would be a better
investment of time to modify the macro for straight numeric sorting.

> Wouldn't be the product codes of the same length then???

Oh yes. But some have leading blanks and some use zeroes. (I did not
invent that and I do not say it's the right way)

Cheers

Michael

 


Topic: Re: Sorting CSV files (10 of 14), Read 158 times, 1 File Attachment
Conf: Hints, tips, shortcuts
From: Christian Ziemski
Date: Tuesday, October 21, 2003 04:17 AM


>In those cases the padding-macro has to be more intelligent...
>We have just to use another blade of the swiss-army-knive.

And here is a macro to sort a file by the first columns numerical
value without mangling the format of the first field (leading zeros).

It's a quick shot, so comments are welcome.

Christian

 
NUMSORT.VDM (1KB)

 


Topic: AW: Sorting CSV files (11 of 14), Read 158 times
Conf: Hints, tips, shortcuts
From: Michael Baas
Date: Tuesday, October 21, 2003 04:31 AM

> so comments are welcome.

Compliments - short & elegant. Nice one :-)

Next request will probably be to allow negatives and real numbers ;-)

Cheers

Michael

 


Topic: Re: Sorting CSV files (12 of 14), Read 163 times
Conf: Hints, tips, shortcuts
From: Ian Binnie
Date: Tuesday, October 21, 2003 07:38 AM

>And here is a macro to sort a
>file by the first columns
>numerical
>value without mangling the
>format of the first field
>(leading zeros).

Christian

I haven't had a chance to look at your macro yet, but this too will be a special case.

The Vedit sort is effectively a flat file sort (with support for variable length lines).

To turn this into a full blown Sort/Merge package would require a parser to allow sort fields to be identified by columns, commas, tabs etc.

I am sure that this could be done, but do not expect this to come with Vedit.

I note some of the earlier comments. To use any of the commercial soting packages requires some input to define the key definitions. (I fondly remember a C/PM SortMerge, which was very powerful, but am not aware of any replacement. Whenever I want to do some complex sorting these days I knock up a C program to do the parsing & call Quicksort or a proprietary sorting algorithm.)

 


Topic: Re: Sorting CSV files (13 of 14), Read 167 times
Conf: Hints, tips, shortcuts
From: Fritz Heberlein
Date: Tuesday, October 21, 2003 04:47 PM

>I fondly
>remember a C/PM SortMerge,
>which was very powerful, but
>am not aware of any
>replacement.

This (i think) Sort/Merge is still around as a component of Data General's obsolescent AO-VS. I'm still using it time by time: it *is* very powerful, but the scripting is a bit arcane and the line length is limited to 80.

Some company called WildHare claims to have a full replacement for it running on many platforms. See

http://www.wildharecomputers.com/whsort.htm

for details.

Fritz

 


Topic: Re: Sorting CSV files (14 of 14), Read 169 times
Conf: Hints, tips, shortcuts
From: Pauli Lindgren
Date: Wednesday, October 22, 2003 08:05 AM

On 10/20/2003 11:36:03 AM, Christian Ziemski wrote:
>Bob:
>
>>Theoretically, distinguishing delimited numeric data fields for sorting
>>shouldn't be hard, and Vedit should be able to do it!
>
>It is hard. For a text editor.

Most sort tools have option for numerical sort.
It shouln't be too difficult. Basically, it requires equivalent of Num_Eval command somewhere.

One of the first freeware programs I published, back in the eighties, was FSORT. I made it to overcome the 64k file size limit of the DOS sort command. FSORT can sort files of unlimited size (but line length is max 255 characters). It sorts text files, but it does have the option for numerical sort. BTW, it uses Merge Sort algorithm (although I didn't know that when I wrote the program).

You can download FSORT from my web page at
http://koti.mbnet.fi/pkl/dos_sw.htm

--
Pauli