Topic: UTF-8 converter (1 of 12), Read 39 times
Conf: VEDIT Macro Library
From: Christian Ziemski
Date: Friday, July 22, 2005 08:10 AM


Since I'm editing more and more files in UTF-8 format (from UNIX/LINUX) I tried to implement a simple(!) converter between ISO-8859 and UTF-8 as VEDIT macro to be able to edit those files with VEDIT.

Is anybody interested in this?

If yes: then I would enhance the documentation and post it here.
(If not, I'll hack the code for myself only ;-)


Christian

 


Topic: Re: UTF-8 converter (2 of 12), Read 35 times
Conf: VEDIT Macro Library
From: Ted Green
Date: Friday, July 22, 2005 10:51 AM

At 08:11 AM 7/22/2005, you wrote:
>Since I'm editing more and more files in UTF-8 format (from UNIX/LINUX) I tried to implement a simple(!) converter between ISO-8859 and UTF-8 as VEDIT macro to be able to edit those files with VEDIT.
>
>Is anybody interested in this?

Christian:

I would be very interested in adding a UTF-8 converter to the VEDIT
menus. :-))

Ted.

 


Topic: UTF-8 converter (3 of 12), Read 33 times
Conf: VEDIT Macro Library
From: Ian Binnie
Date: Friday, July 22, 2005 09:20 PM

On 7/22/2005 8:10:41 AM, Christian Ziemski wrote:
>
>Since I'm editing more and
>more files in UTF-8 format
>(from UNIX/LINUX) I tried to
>implement a simple(!)
>converter between ISO-8859 and
>UTF-8 as VEDIT macro to be
>able to edit those files with
>VEDIT.
>
>Is anybody interested in this?

Christian,

I would be interested to see this, although at the moment I don't have a great need to do these conversions.

This was discussed on this board just over a year ago, and I suggested this could most easily be done by the Windows API in Vedit.

Ted seemed interested at the time.

 


Topic: Re: UTF-8 converter (4 of 12), Read 36 times
Conf: VEDIT Macro Library
From: Christian Ziemski
Date: Saturday, July 23, 2005 02:25 AM

On Fri, 22 Jul 2005 21:20:00 -0400, Ian Binnie wrote:

>On 7/22/2005 8:10:41 AM, Christian Ziemski wrote:
>
>> [simple(!) converter between ISO-8859 and UTF-8]
>
>
>This was discussed on this board just over a year ago, and I
>suggested this could most easily be done by the Windows API
>in Vedit.

Yes, I remember: During development of the ANSI-UTF16 translator.

Using the API would be the best, I agree.

But I needed a quick shot and so I wrote a SIMPLE macro.
It only is able to translate German Umlauts for now ;-)

I'm using an expandable table and so it should be easy to fill in more
characters.


Christian

 


Topic: Re: UTF-8 converter (5 of 12), Read 34 times
Conf: VEDIT Macro Library
From: Ted Green
Date: Saturday, July 23, 2005 05:29 PM

At 09:20 PM 7/22/2005, you wrote:

>This was discussed on this board just over a year ago, and I suggested this could most easily be done by the Windows API in Vedit.
>
>Ted seemed interested at the time.

I will have to look into this again then.

Ted.

 


Topic: UTF-8 converter (6 of 12), Read 31 times
Conf: VEDIT Macro Library
From: Christian Ziemski
Date: Monday, July 25, 2005 04:34 AM


Now I finished a first public version of my converter between ISO 8859 and UTF-8.

It's still under development and experimental, but at least working here.

http://ziemski.privat.t-online.de/vedit/macros/UTF-8.vdm

For now it starts with a main dialog to call the functions:

1.) Check the current file whether it is ISO or UTF-8
2.) translate from ISO to UTF-8
3.) translate from UTF-8 to ISO
4.) check and correct translation table (not yet ready)

The translation table http://ziemski.privat.t-online.de/vedit/macros/UTF-8.dat has to be in the USER_MACRO directory for now. In a later version it will be allowed to be in other directories.

The current table contains only the German Umlauts, but can be easily extended manually.

The above translation functions try to help when they find a unknown (related to the table) character.


In this version the macro only supports 2-byte UTF-8 characters.
And the error checking should be improved...

Christian

 


Topic: UTF-8 converter (7 of 12), Read 20 times
Conf: VEDIT Macro Library
From: Pauli Lindgren
Date: Friday, December 07, 2007 10:57 AM

On 7/25/2005 4:34:27 AM, Christian Ziemski wrote:
>
>Now I finished a first public
>version of my converter
>between ISO 8859 and UTF-8.
>
>It's still under development
>and experimental, but at least
>working here.

Christian, what is the situation with the UTF-8 conversion macro?

I wonder if it could be combined with the UTF-16 conversion?
I believe the difference would be just how to get the numeric value of the Unicode character. After that, the conversions should be the same.

One additional feature in this conversion could be that if a character does not exist in Windows character set, it would be optionally converted into HTML entity or hex value such as & #160; . That would be somewhat more readable, and it would work as it is in HTML files even if it is not converted back after editing.

--
Pauli

 


Topic: Re: UTF-8 converter (8 of 12), Read 20 times
Conf: VEDIT Macro Library
From: Christian Ziemski
Date: Friday, December 07, 2007 03:54 PM

Hi Pauli!

On 07.12.2007 17:01 in vedit-macros Pauli Lindgren wrote:
>
> On 7/25/2005 4:34:27 AM, Christian Ziemski wrote:
>> Now I finished a first public version of my converter
>> between ISO 8859 and UTF-8.
>>
>> It's still under development and experimental, but at least
>> working here.
>
> Christian, what is the situation with the UTF-8 conversion macro?

No changes since 2005.

I switched completely to Linux at home.
And at work I'm using VEDIT for Windows things and Linux editors for
Linux things.
So I don't have problems with UTF-8 conversions any more...
Sorry for the bad news.

But feel free to take the macro over and work on it!

Christian

 


Topic: UTF-8 converter (9 of 12), Read 18 times
Conf: VEDIT Macro Library
From: Ian Binnie
Date: Friday, December 07, 2007 05:36 PM

On 7/22/2005 8:10:41 AM, Christian Ziemski wrote:
>
>Since I'm editing more and
>more files in UTF-8 format
>(from UNIX/LINUX) I tried to
>implement a simple(!)
>converter between ISO-8859 and
>UTF-8 as VEDIT macro to be
>able to edit those files with
>VEDIT.
>
>Is anybody interested in this?
>
>If yes: then I would enhance
>the documentation and post it
>here.
>(If not, I'll hack the code
>for myself only ;-)
>
>
>Christian

I have been doing quite a few conversions myself.

It is just a single API call to convert one buffer to another.

A Vedit macro would be much more complex. It doesn't make sense not to do this in Vedit. I believe Ted indicated he was going to implement this when it was last discussed a couple of years ago.

 


Topic: Re: UTF-8 converter (10 of 12), Read 17 times
Conf: VEDIT Macro Library
From: Christian Ziemski
Date: Saturday, December 08, 2007 03:16 AM

On 07.12.2007 23:36 vedit-macros Listmanager wrote:
> From: Ian Binnie
>
> On 7/22/2005 8:10:41 AM, Christian Ziemski wrote:
>> I tried to implement a simple(!) converter between ISO-8859 and
>> UTF-8 as VEDIT macro to be able to edit those files with VEDIT.
>
> I have been doing quite a few conversions myself.
>
> It is just a single API call to convert one buffer to another.
>
> A Vedit macro would be much more complex. It doesn't make sense not to do this in Vedit.
> I believe Ted indicated he was going to implement this when it was
last discussed a couple of years ago.

Ian:

Yes, I remember that discussion.
And so I only worked on it for two days in July 2005 - from when those
messages are.

Christian

 


Topic: UTF-8 converter (11 of 12), Read 17 times
Conf: VEDIT Macro Library
From: Pauli Lindgren
Date: Tuesday, February 05, 2008 10:54 AM

On 12/7/2007 5:36:02 PM, Ian Binnie wrote:
>
>I have been doing quite a few conversions myself.
>
>It is just a single API call to convert one buffer to another.
>
>A Vedit macro would be much more complex. It doesn't make sense not to do
>this in Vedit. I believe Ted indicated he was going to implement this when it
>was last discussed a couple of years ago.

The problem with using API for the conversion is that it destroys any characters that are not found in 8-bit character set. With a Vedit macro, you could convert them for example into HTML entities.

Anyway, it would be nice to have an API call option in the macro language (similar to Sys_Call() and Sys_Int()). This could be used for the Unicode conversion, but I think there would be many other uses, too. For example, call the color picker to get color value for HTML.

By the way, I just tried the macro function Sys_Reg_Address(0) to get the address of T-Reg 0, but Vedit says "Invalid command". Has this command been removed? Or is it in DOS version only? It is still in Help.

Another thing I found about UTF-8: If you copy UTF-8 text from web browser to clipboard, and then paste it to Vedit, any non-ANSI characters are replaced with question marks. ANSI characters such as and are displayed correctly. It seems that Windows performs automatic conversion when pasting from clipboard. This means that any non-ANSI characters are lost. (Notepad automatically display Unicode characters when pasted, even if you do not have an Unicode file open.)

Example of multi language text using UTF-8 can be found on this Wikipedia page:
http://en.wikipedia.org/wiki/Wikipedia:Text_editor_support#Command_line_tools

I wonder if it would be possible to have some kind of "paste special" that would retain the multibyte characters?

Of course it would be best to have real Unicode support.

--
Pauli

 


Topic: UTF-8 converter (12 of 12), Read 22 times
Conf: VEDIT Macro Library
From: Ian Binnie
Date: Tuesday, February 05, 2008 06:38 PM

On 2/5/2008 10:54:01 AM, Pauli Lindgren wrote:

>Another thing I found about UTF-8: If
>you copy UTF-8 text from web browser to
>clipboard, and then paste it to Vedit,
>any non-ANSI characters are replaced
>with question marks. ANSI characters
>such as and are displayed correctly.
>It seems that Windows performs automatic
>conversion when pasting from clipboard.
>This means that any non-ANSI characters
>are lost. (Notepad automatically display
>Unicode characters when pasted, even if
>you do not have an Unicode file open.)

The clipboard commands allow you to specify format (actually it is mandatory).

There are many formats, including OEM, ANSI and Unicode (not UTF-8 - this is pasted as UTF-16)