Read topic starting at message #18412

Topic:	Preliminary VEDIT 6.13: BASE64 (1 of 16), Read 27 times
Conf:	VEDIT Macro Library
From:	Christian Ziemski
Date:	Wednesday, September 08, 2004 01:59 PM

I played a bit with the

* New {MISC, More macros, BASE64) decodes Base64 file into text

Here my findings:

1) it can lead into an endless loop,
for example with small data of 0-10 bytes (or so)

2) fails if the temporary file is still opened

3) If I decide "No" at the initial dialog the temporary file isn't
closed.

Christian

Topic:	Re: Preliminary VEDIT 6.13: BASE64 (2 of 16), Read 24 times
Conf:	VEDIT Macro Library
From:	Christian Ziemski
Date:	Wednesday, September 08, 2004 02:37 PM

PS:

>* New {MISC, More macros, BASE64) decodes Base64 file into text

I now tried it with a 500KB data file (vpw.exe in a mail).
It was decoded relatively fast and correct.
And a nice progess indicator! (which should be overwritten with a
"ready" or so at end.)

Nice work, John H.!

Christian

Topic:	Re: Preliminary VEDIT 6.13: BASE64 (3 of 16), Read 24 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Wednesday, September 08, 2004 02:42 PM

At 01:59 PM 9/8/2004, you wrote:

>I played a bit with the
>
>* New {MISC, More macros, BASE64) decodes Base64 file into text
>
>
>Here my findings:
>
>1) it can lead into an endless loop,
> for example with small data of 0-10 bytes (or so)

Yes, I am aware that it can loop endlessly. I was hoping to fix
it before releasing it, but I don't understand the algorithm yet.

>2) fails if the temporary file is still opened
>3) If I decide "No" at the initial dialog the temporary file isn't
> closed.

I wasn't aware to that additional problem.

Ted.

Topic:	Re: Preliminary VEDIT 6.13: BASE64 (4 of 16), Read 25 times
Conf:	VEDIT Macro Library
From:	Christian Ziemski
Date:	Wednesday, September 08, 2004 04:24 PM

On Wed, 08 Sep 2004 14:42:00 -0400, Ted Green wrote:

>>* New {MISC, More macros, BASE64) decodes Base64 file into text
>>
>Yes, I am aware that it can loop endlessly. I was hoping to fix
>it before releasing it, but I don't understand the algorithm yet.

I don't know anything about Base64, but it looks like as if the data
(at least the last line) has to contain a number of characters
dividable by 4 without remainder. (Found out by playing.)

So I think the loop
while (#108 <= 4) { ... #108++ ...}
in the macro has problems if the data isn't formated as expected.

(Wild guess!)

Perhaps John H. should jump in here...

Christian

Topic:	Re: Preliminary VEDIT 6.13: BASE64 (5 of 16), Read 25 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Wednesday, September 08, 2004 04:30 PM

At 04:24 PM 9/8/2004, you wrote:
>So I think the loop
> while (#108 <= 4) { ... #108++ ...}
>in the macro has problems if the data isn't formated as expected.
>
>(Wild guess!)
>
>Perhaps John H. should jump in here...

I noticed that John H. just posted here, so hopefully John and Christian together can figure this out.

I noticed that (senselessly) running the macro on an arbitrary text file caused it to loop endlessly. It should probably check the first 4/8/12/16 chars to confirm that it is a likely base64 file and report an error if not.

I realize John H. wrote this as a quick-and-dirty macro, but since it quite useful, I want to add it to VEDIT.

Thanks.

Ted.

Topic:	Preliminary VEDIT 6.13: BASE64 (9 of 16), Read 25 times
Conf:	VEDIT Macro Library
From:	John H
Date:	Wednesday, September 08, 2004 05:27 PM

On Wed, 8 Sep 2004 16:31:02 -0400 GMT, Ted Green wrote:

> I noticed that (senselessly) running the macro on an arbitrary
> text file caused it to loop endlessly. It should probably check
> the first 4/8/12/16 chars to confirm that it is a likely base64
> file and report an error if not.

Yes, this I hadn't made any effort to error trap.

Running the B64-OUT.VDM on an arbitrary text file is more what I had
in mind. ;-)

> I realize John H. wrote this as a quick-and-dirty macro, but since
> it quite useful, I want to add it to VEDIT.

Ahem! Quick and dirty wasn't in the plan, I think my lack of fluency
of the macro language is key to flaws in there. I'd hoped to had
gotten more interest back when I posted about these macros. Might
have been able to cure any ailments long ago.

--
John

Topic:	Preliminary VEDIT 6.13: BASE64 (8 of 16), Read 27 times
Conf:	VEDIT Macro Library
From:	John H
Date:	Wednesday, September 08, 2004 05:19 PM

On Wed, 8 Sep 2004 16:24:51 -0400 GMT, Christian Ziemski wrote:

> On Wed, 08 Sep 2004 14:42:00 -0400, Ted Green wrote:

>>>* New {MISC, More macros, BASE64) decodes Base64 file into text
>>>
>>Yes, I am aware that it can loop endlessly. I was hoping to fix
>>it before releasing it, but I don't understand the algorithm yet.

> I don't know anything about Base64, but it looks like as if the data
> (at least the last line) has to contain a number of characters
> dividable by 4 without remainder. (Found out by playing.)

> So I think the loop
> while (#108 <= 4) { ... #108++ ...}
> in the macro has problems if the data isn't formated as expected.

Base-64 coded output does need to be padded if input data is not
exactly divisible. There are also other requirements, such as
maximum line length of base-64 encoded data and a couple other
things.

Just for kicks the 'right brain' versions on my web page might be
more clear in how the encode/decode work.

At any rate, you're correct in that the decoding requires properly
formed base-64 coded input, per RFCs. The B64-OUT.VDM as far as I
know from my testing will always produce properly formed output
which should always work with B64-IN.VDM without problems.

I really forget but I think that I included a quick validation of
the input data.

If you could provide me the encoded data that loops the macro I'd be
interested in finding out what is happening. With all the rain I'm
getting here in FL I'm on the computer a lot more now days. :-)

--
John

Topic:	Re: Preliminary VEDIT 6.13: BASE64 (11 of 16), Read 28 times
Conf:	VEDIT Macro Library
From:	Ian Binnie
Date:	Wednesday, September 08, 2004 10:04 PM

On 9/8/2004 2:42:07 PM, Ted Green wrote:
>At 01:59 PM 9/8/2004, you
>wrote:
>
>>I played a bit with the
>>
>>* New {MISC, More macros, BASE64) decodes Base64 file into text
>>
>>
>>Here my findings:
>>
>>1) it can lead into an endless loop,
>> for example with small data of 0-10 bytes (or so)
>
>Yes, I am aware that it can
>loop endlessly. I was hoping
>to fix
>it before releasing it, but I
>don't understand the algorithm
>yet.
>
>>2) fails if the temporary file is still opened
>>3) If I decide "No" at the initial dialog the temporary file isn't
>> closed.
>
>I wasn't aware to that
>additional problem.
>
>Ted.
>

I have written a few macros to process MIME files, and at one stage considered Base64 conversion, but there seemed little reason, as there are so many programs (most email clients, WinZIP 9.0) that do this already.
(Not that I wish to discourage others from trying - but wonder where you would get Base64 outside of an email file, and there are many other issues to decode these.)

If I was attempting this I would be looking for "Content-Transfer-Encoding: base64" in the file.

I only briefly looked at the macro, but it is not immediately clear how it works, and could do with a few comments.
The table lookup technique developed by Christian for UTF translation would seem to be a cleaner way to translate.

For the benefit of others I have extracted the salient bits of RFC 2045, which describes the coding.

Base64 Content-Transfer-Encoding

The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right, a 24-bit input group is formed by concatenating 3 8bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is translated into a single digit in the base64 alphabet.

Each 6-bit group is used as an index into an array of 64 printable characters.

Table 1: The Base64 Alphabet
0 A
1 B
2 C
...
25 Z
26 a
27 b
...
51 z
52 0
...
61 9
62 +
63 /
(pad) =

The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances.

Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full encoding quantum is always completed at the end of a body. When fewer than 24 input bits are available in an input group, zero bits are added (on the right) to form an integral number of 6-bit groups. Padding at the end of the data is performed using the "=" character.

...

Because it is used only for padding at the end of the data, the occurrence of any "=" characters may be taken as evidence that the end of the data has been reached (without truncation in transit). No such assurance is possible, however, when the number of octets transmitted was a multiple of three and no "=" characters are present.

Topic:	Re: Preliminary VEDIT 6.13: BASE64 (12 of 16), Read 30 times
Conf:	VEDIT Macro Library
From:	Ted Green
Date:	Wednesday, September 08, 2004 10:26 PM

At 10:04 PM 9/8/2004, you wrote:
>I have written a few macros to process MIME files, and at one stage considered Base64 conversion, but there seemed little reason, as there are so many programs (most email clients, WinZIP 9.0) that do this already.
>(Not that I wish to discourage others from trying - but wonder where you would get Base64 outside of an email file, and there are many other issues to decode these.)

Thank you for your extensive feedback.
Personally I find the base64 decoding useful since I can now cut-paste a base64 block into VEDIT and convert it to text. I have no idea how many others would find this useful, but obviously at least a few others do.

>For the benefit of others I have extracted the salient bits of RFC 2045, which describes the coding.

Ouch, I didn't know it was part of an RFC.

>Padding at the end of the data is performed using the "=" character.

That is a very useful bit of extra info.

Ted.

Topic:	Re: Preliminary VEDIT 6.13: BASE64 (14 of 16), Read 25 times
Conf:	VEDIT Macro Library
From:	Ian Binnie
Date:	Thursday, September 09, 2004 02:15 AM

On 9/8/2004 10:26:37 PM, Ted Green wrote:

>Personally I find the base64
>decoding useful since I can
>now cut-paste a base64 block
>into VEDIT and convert it to
>text.

This makes sense - most of the base64 I have is Microsoft Office files.

Text tends to be quoted-printable. I wrote a macro MIMEquoted.vdm which converts this. I am sure I have posted it before, but it is also in:-
http://members.optusnet.com.au/~ibinnie/Ian/Downloads/MIMEtrim.zip

Topic:	Preliminary VEDIT 6.13: BASE64 (13 of 16), Read 33 times
Conf:	VEDIT Macro Library
From:	John H
Date:	Thursday, September 09, 2004 12:08 AM

On Wed, 8 Sep 2004 22:04:38 -0400 GMT, Ian Binnie wrote:

Hi Ian, I know you were addressing Ted here but figured I'd comment.

> I have written a few macros to process MIME files, and at one
> stage considered Base64 conversion, but there seemed little
> reason, as there are so many programs (most email clients, WinZIP
> 9.0) that do this already. (Not that I wish to discourage others
> from trying - but wonder where you would get Base64 outside of an
> email file, and there are many other issues to decode these.)

My main reason for writing the decode macro was to examine base64
email attachments, and to get more Vedit macro language practice. My
mail client displays ASCII only and while I could decode and save
attachments from the MUA I decided to just go ahead and make it a
macro project for VEDIT. Once I got the groundwork laid then it
seemed logical to write an encoding macro.

I sometimes use these as a very simple cipher for my files or text,
totally separate from email related use.

> If I was attempting this I would be looking for
> "Content-Transfer-Encoding: base64" in the file.

I didn't intend it to be locked into needing a MIME envelope to
function although have pondered and planned to make the macros
optionally recognize some basic MIME header lines, but as you say
it's already implemented in mail clients where it's needed most so
my momentum faded. Also, the in-situato decoding wouldn't have the
luxury of the MIME information..

> I only briefly looked at the macro, but it is not immediately
> clear how it works, and could do with a few comments. The table
> lookup technique developed by Christian for UTF translation would
> seem to be a cleaner way to translate.

I wasn't happy with the lookup table approaches I devised, but have
not looked at the UTF macro either. I still probably have many
misconceptions regarding the macro language.

> The encoded output stream must be represented in lines of no more
> than 76 characters each. All line breaks or other characters not
> found in Table 1 must be ignored by decoding software.

It's been a while but I think I may have dismissed trying to cover
this portion of the RFC and settled on just working with valid input
with little exceptions, also seemed reasonable since there is a
companion encode macro.

--
John

Topic:	Preliminary VEDIT 6.13: BASE64 (6 of 16), Read 25 times
Conf:	VEDIT Macro Library
From:	John H
Date:	Wednesday, September 08, 2004 04:39 PM

On Wed, 8 Sep 2004 13:59:59 -0400 GMT, Christian Ziemski wrote:

> I played a bit with the

> * New {MISC, More macros, BASE64) decodes Base64 file into text

> Here my findings:

> 1) it can lead into an endless loop,
> for example with small data of 0-10 bytes (or so)

Is this with the data selected as block? I have noticed problems in
general if the cursor is at 1,1 when running some of the 'built-in'
macros from time to time for unknown reasons.

> 2) fails if the temporary file is still opened

> 3) If I decide "No" at the initial dialog the temporary file isn't
> closed.

I'll take a look-see. I'm going to be kind of rusty now. In my
attempt to make the b64 macro pair flexable I'm sure I gave more
time to certain areas. Specifically the 5-6 test situations I cycled
through.

--
John

Topic:	Re: Preliminary VEDIT 6.13: BASE64 (7 of 16), Read 25 times
Conf:	VEDIT Macro Library
From:	Christian Ziemski
Date:	Wednesday, September 08, 2004 04:55 PM

On Wed, 08 Sep 2004 16:39:00 -0400, John H wrote:

>On Wed, 8 Sep 2004 13:59:59 -0400 GMT, Christian Ziemski wrote:
>
>> 1) it can lead into an endless loop,
>> for example with small data of 0-10 bytes (or so)
>
>Is this with the data selected as block?

No. I ran it on the whole file (with random data for testing).

>I have noticed problems in general if the cursor is at 1,1
>when running some of the 'built-in'
>macros from time to time for unknown reasons.

Never seen that here.

>I'll take a look-see. I'm going to be kind of rusty now. In my
>attempt to make the b64 macro pair flexable I'm sure I gave more
>time to certain areas. Specifically the 5-6 test situations I cycled
>through.

Apropos flexibility: Since the macro will be introduced with Vedit
6.13 it possibly would make sense to remove the version tests for 6.12
in it. Simply document it as "Requires Vedit 6.13" ... (Ted?)
That would make it a bit less complex.

Christian - finishing for today at 22:55

Topic:	Preliminary VEDIT 6.13: BASE64 (10 of 16), Read 27 times
Conf:	VEDIT Macro Library
From:	John H
Date:	Wednesday, September 08, 2004 05:42 PM

On Wed, 8 Sep 2004 16:55:55 -0400 GMT, Christian Ziemski wrote:

> On Wed, 08 Sep 2004 16:39:00 -0400, John H wrote:

>>On Wed, 8 Sep 2004 13:59:59 -0400 GMT, Christian Ziemski wrote:
>>
>>> 1) it can lead into an endless loop,
>>> for example with small data of 0-10 bytes (or so)
>>
>>Is this with the data selected as block?

> No. I ran it on the whole file (with random data for testing).

Umm, random? Base64 encoded data -must- only contain:

A-Z, a-z, 0-9, "+" and "/" using 1 or 2 * "=" as padding when
needed.

I'm getting the idea that perhaps the crux of this is perhaps simply
a error trap to ensure the input is actually base64 encoded..?

>>I have noticed problems in general if the cursor is at 1,1
>>when running some of the 'built-in'
>>macros from time to time for unknown reasons.

> Never seen that here.

Happened the other day here, when I was using UNIX->DOS conversion.
My input data I believe was the problem. I think I still have the
text file that broke the macro. The 1,1 cursor position might not be
the main factor, could be the 1,1 data. Forget. Been some times
since I had these findings in my head.

--
John

Topic:	Preliminary VEDIT 6.13: BASE64 (15 of 16), Read 21 times
Conf:	VEDIT Macro Library
From:	Christian Ziemski
Date:	Thursday, September 09, 2004 02:25 AM

On 9/8/2004 5:42:03 PM, John H wrote:
>
>Umm, random? Base64 encoded data -must- only contain: ...

Yes, random, because
- I didn't know about base64
- I beta tested it
- I stress tested it ;-)

Example:

This data is decoded:

dddddddddd
xxxx

But this leads into an endless loop:

dddddddddd
xxxxx

Christian

Topic:	Preliminary VEDIT 6.13: BASE64 (16 of 16), Read 30 times
Conf:	VEDIT Macro Library
From:	John H
Date:	Thursday, September 09, 2004 09:53 AM

On Thu, 9 Sep 2004 02:25:24 -0400 GMT, Christian Ziemski wrote:

> Yes, random, because
> - I didn't know about base64
> - I beta tested it
> - I stress tested it ;-)

Ah! :-)

I am refamiliarizing myself some with the known quirks. The one that
is most vivid is the block selection with the decode macro.

It can be very easy to get the last hidden newline when manually
selecting a block. This was one of those 'more hassle than it's
worth' items so I considered it a 'TODO' item since it was easy to
just backup my block end position. Of course if the data is padded
it was much simpler to discern where the data ends.

My thoughts were that 99% of the time I was going to be providing
proper, non corrupted encoded data and considered the basic input
integrity checking a user responsibility. EG: feeding binary to
B64-IN.VDM or likewise.

..to refer to one quirk of the UNIX->DOS convert macro.. This can be
repeated over and over inserting multiple before the . Much
simpler than encode/decode but obviously left some aspect for user
discretion. Similarly there is no input data checks to prevent
trying to convert a file that is already in the target format.

Question is where does it get to be more trouble than it's worth
trying to pre qualify the encoded data and the numerous other
possible situations?

Wondering why no chocolate milk is coming out of a purple cow comes
to mind. Everyone knows only purple milk comes from purple cows. ;-)

Anyhow, I am not trying to make excuses but there does, I think,
need to be a boundary somewhere as to how far to 'black box' the
macro considering its use somewhat suggests the user has cursory
knowledge of base64.

I've possibly got another hurricane coming my way in a few day and
will be using the dry (not raining) hours of the day making sure I
am ready, again!

Will drop in the message board and work at the macro during the
evening as time and exhaustion permits.

--
John