|
On 9/8/2004 2:42:07 PM, Ted Green wrote:
>At 01:59 PM 9/8/2004, you
>wrote:
>
>>I played a bit with the
>>
>>* New {MISC, More macros, BASE64) decodes Base64 file into text
>>
>>
>>Here my findings:
>>
>>1) it can lead into an endless loop,
>> for example with small data of 0-10 bytes (or so)
>
>Yes, I am aware that it can
>loop endlessly. I was hoping
>to fix
>it before releasing it, but I
>don't understand the algorithm
>yet.
>
>>2) fails if the temporary file is still opened
>>3) If I decide "No" at the initial dialog the temporary file isn't
>> closed.
>
>I wasn't aware to that
>additional problem.
>
>Ted.
>
I have written a few macros to process MIME files, and at one stage considered Base64 conversion, but there seemed little reason, as there are so many programs (most email clients, WinZIP 9.0) that do this already.
(Not that I wish to discourage others from trying - but wonder where you would get Base64 outside of an email file, and there are many other issues to decode these.)
If I was attempting this I would be looking for "Content-Transfer-Encoding: base64" in the file.
I only briefly looked at the macro, but it is not immediately clear how it works, and could do with a few comments.
The table lookup technique developed by Christian for UTF translation would seem to be a cleaner way to translate.
For the benefit of others I have extracted the salient bits of RFC 2045, which describes the coding.
Base64 Content-Transfer-Encoding
The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right, a 24-bit input group is formed by concatenating 3 8bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each of which is translated into a single digit in the base64 alphabet.
Each 6-bit group is used as an index into an array of 64 printable characters.
Table 1: The Base64 Alphabet
0 A
1 B
2 C
...
25 Z
26 a
27 b
...
51 z
52 0
...
61 9
62 +
63 /
(pad) =
The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances.
Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full encoding quantum is always completed at the end of a body. When fewer than 24 input bits are available in an input group, zero bits are added (on the right) to form an integral number of 6-bit groups. Padding at the end of the data is performed using the "=" character.
...
Because it is used only for padding at the end of the data, the occurrence of any "=" characters may be taken as evidence that the end of the data has been reached (without truncation in transit). No such assurance is possible, however, when the number of octets transmitted was a multiple of three and no "=" characters are present.
|
|