Alaska Software Inc. - Encodes a message header field as per RFC2047
Username: Password:
AuthorTopic: Encodes a message header field as per RFC2047
Trinkl FerencEncodes a message header field as per RFC2047
on Wed, 05 Mar 2008 13:39:42 +0100
Hi,

Does anyone have solution to encoding message header field (subject)  into 
8859-2 character set?
Andreas Gehrs-Pahl
Re: Encodes a message header field as per RFC2047
on Fri, 07 Mar 2008 04:34:57 -0500
Trinkl,

>Does anyone have solution to encoding message header field (subject)
>into 8859-2 character set?

I haven't actually ever done this, but the following should work, assuming 
that the variable 'cText' contains your Subject line that uses ISO-8859-2 
characters.

LOCAL cSubject := "=?ISO-8859-2?B?" + ToBase64(cText) + "?="
LOCAL oEMail   := MIMEMessage():New()
[...]
   oEMail:SetSubject(cSubject)
[...]

If you prefer to use the "Q" encoding -- which by default is a little more 
readable if the encoding is not handled by the receiver -- you could try 
the following, but it might not come out quite right, in case there are 
any special characters contained in "cText" that are treated differently
by any of the forwarding internet servers!

LOCAL cSubject := "=?ISO-8859-2?Q?" + RFC2047_ToQP(cText) + "?="

Function RFC2047_ToQP(cText)
   cText := StrTran(ToQP(cText), "?", "=3F")
   cText := StrTran(cText, "_", "=5F")
return (StrTran(cText, " ", "_"))

Let us know if it works,

-- Andreas

---                                                                      ---
  Andreas Gehrs-Pahl              E-Mail: GPahl@CharterMI.net
  415 Gute Street                     or: Andreas@DDPSoftware.com
  Owosso, MI 48867-4410               or: Andreas@Aerospace-History.net
  Tel: (989) 723-9927           Web Site: http://www.Aerospace-History.net
---                                                                      ---
Trinkl FerencRe: Encodes a message header field as per RFC2047
on Fri, 07 Mar 2008 22:51:15 +0100
----- Original Message ----- 
From: "Andreas Gehrs-Pahl" <Andreas@DDPSoftware.com>
Newsgroups: public.xbase++.asinet
Sent: Friday, March 07, 2008 10:34 AM
Subject: Re: Encodes a message header field as per RFC2047


>>Does anyone have solution to encoding message header field (subject)
>>into 8859-2 character set?
>
> I haven't actually ever done this, but the following should work, assuming
> that the variable 'cText' contains your Subject line that uses ISO-8859-2
> characters.
>
> LOCAL cSubject := "=?ISO-8859-2?B?" + ToBase64(cText) + "?="

Thanks! This is very simple and easy to use. I might have known.


> If you prefer to use the "Q" encoding -- which by default is a little more
> readable if the encoding is not handled by the receiver -- you could try
> the following, but it might not come out quite right, in case there are
> any special characters contained in "cText" that are treated differently
> by any of the forwarding internet servers!
>
> LOCAL cSubject := "=?ISO-8859-2?Q?" + RFC2047_ToQP(cText) + "?="

Subject containing 8-bit value characters, and may be very long (approx. 100
characters).
Encoding is complicated task.
I found a free PHP samle:

/**
      Encodes a header as per RFC2047
     
     * @param array $input  The header data to encode
     * @param array $params Extra build parameters
     
      @return array Encoded data
     * @access private
*/
    function _encodeHeaders($input, $params = array())
    {

        $build_params = $this->_build_params;
        while (list($key, $value) = each($params)) {
            $build_params[$key] = $value;
        }
        //$hdr_name: Name of the heaer
        //$hdr_value: Full line of header value.
        //$hdr_value_out: The recombined $hdr_val-atoms, or the encoded
string.

        $useIconv = true;
        if (isset($build_params['ignore-iconv'])){
            $useIconv = !$build_params['ignore-iconv']){
        }
        foreach ($input as $hdr_name => $hdr_value) {
            if (preg_match('#([\x80-\xFF]){1}#', $hdr_value)) {
                if (function_exists('iconv_mime_encode') && $useIconv) {
                    $imePrefs = array();
                    if ($build_params['head_encoding'] == 'base64') {
                        $imePrefs['scheme'] = 'B';
                    } else {
                        $imePrefs['scheme'] = 'Q';
                    }
                    $imePrefs['input-charset']  =
$build_params['head_charset'];
                    $imePrefs['output-charset'] =
$build_params['head_charset'];
                    $imePrefs['line-length'] = 74;
                    $imePrefs['line-break-chars'] = "\r\n"; //Specified in
RFC2047

                    $hdr_value = iconv_mime_encode($hdr_name, $hdr_value,
$imePrefs);
                    $hdr_value = preg_replace("#^{$hdr_name}\:\ #", "",
$hdr_value);
                } elseif ($build_params['head_encoding'] == 'base64') {
                    //Base64 encoding has been selected.
                    //Base64 encode the entire string
                    $hdr_value = base64_encode($hdr_value);

                    //Generate the header using the specified params and
dynamicly
                    //determine the maximum length of such strings.
                    //75 is the value specified in the RFC. The first -2 is
there so
                    //the later regexp doesn't break any of the translated
chars.
                    //The -2 on the first line-regexp is to compensate for
the ": "
                    //between the header-name and the header value
                    $prefix = '=?' . $build_params['head_charset'] . '?B?';
                    $suffix = '?=';
                    $maxLength = 75 - strlen($prefix . $suffix) - 2;
                    $maxLength1stLine = $maxLength - strlen($hdr_name) - 2;

                    //We can cut base4 every 4 characters, so the real max
we can get must be rounded down.
                    $maxLength = $maxLength - ($maxLength % 4);
                    $maxLength1stLine = $maxLength1stLine -
($maxLength1stLine % 4);

                    $cutpoint = $maxLength1stLine;
                    $hdr_value_out = $hdr_value;
                    $output = "";
                    while ($hdr_value_out) {
                        //Split translated string at every $maxLength
                        $part = substr($hdr_value_out, 0, $cutpoint);
                        $hdr_value_out = substr($hdr_value_out, $cutpoint);
                        $cutpoint = $maxLength;
                        //RFC 2047 specifies that any split header should be
seperated
                        //by a CRLF SPACE.
                        if ($output) {
                            $output .=  "\r\n ";
                        }
                        $output .= $prefix . $part . $suffix;
                    }
                    $hdr_value = $output;
                } else {
                    //quoted-printable encoding has been selected

                    //Fix for Bug #10298, Ota Mares <om@viazenetti.de>
                    //Check if there is a double quote at beginning or end
of the string to
                    //prevent that an open or closing quote gets ignored
because its encapsuled
                    //by an encoding prefix or suffix.
                    //Remove the double quote and set the specific prefix or
suffix variable
                    //so later we can concat the encoded string and the
double quotes back
                    //together to get the intended string.
                    $quotePrefix = $quoteSuffix = '';
                    if ($hdr_value{0} == '"') {
                        $hdr_value = substr($hdr_value, 1);
                        $quotePrefix = '"';
                    }
                    if ($hdr_value{strlen($hdr_value)-1} == '"') {
                        $hdr_value = substr($hdr_value, 0, -1);
                        $quoteSuffix = '"';
                    }

                    //Generate the header using the specified params and
dynamicly
                    //determine the maximum length of such strings.
                    //75 is the value specified in the RFC. The -2 is there
so
                    //the later regexp doesn't break any of the translated
chars.
                    //The -2 on the first line-regexp is to compensate for
the ": "
                    //between the header-name and the header value
                    $prefix = '=?' . $build_params['head_charset'] . '?Q?';
                    $suffix = '?=';
                    $maxLength = 75 - strlen($prefix . $suffix) - 2 - 1;
                    $maxLength1stLine = $maxLength - strlen($hdr_name) - 2;
                    $maxLength = $maxLength - 1;

                    //Replace all special characters used by the encoder.
                    $search  = array('=',   '_',   '?',   ' ');
                    $replace = array('=3D', '=5F', '=3F', '_');
                    $hdr_value = str_replace($search, $replace, $hdr_value);

                    //Replace all extended characters (\x80-xFF) with their
                    //ASCII values.
                    $hdr_value = preg_replace('#([\x80-\xFF])#e',
                        '"=" . strtoupper(dechex(ord("\1")))',
                        $hdr_value);

                    //This regexp will break QP-encoded text at every
$maxLength
                    //but will not break any encoded letters.
                    $reg1st = "|(.{0,$maxLength1stLine}[^\=][^\=])|";
                    $reg2nd = "|(.{0,$maxLength}[^\=][^\=])|";
                    //Fix for Bug #10298, Ota Mares <om@viazenetti.de>
                    //Concat the double quotes if existant and encoded
string together

                    $hdr_value = $quotePrefix . $hdr_value . $quoteSuffix;


                    $hdr_value_out = $hdr_value;
                    if (strlen($hdr_value_out) >= ($maxLength1stLine +
strlen($prefix . $suffix))) {
                        //Begin with the regexp for the first line.
                        $reg = $reg1st;
                        $output = "";
                        while ($hdr_value_out) {
                            //Split translated string at every $maxLength
                            //But make sure not to break any translated
chars.
                            $found = preg_match($reg, $hdr_value_out,
$matches);

                            //After this first line, we need to use a
different
                            //regexp for the first line.
                            $reg = $reg2nd;

                            //Save the found part and encapsulate it in the
                            //prefix & suffix. Then remove the part from the
                            //$hdr_value_out variable.
                            if ($found) {
                                $part = $matches[0];
                                $hdr_value_out = substr($hdr_value_out,
strlen($matches[0]));
                            } else {
                                $part = $hdr_value_out;
                                $hdr_value_out = "";
                            }

                            //RFC 2047 specifies that any split header
should be seperated
                            //by a CRLF SPACE.
                            if ($output) {
                                $output .=  "\r\n ";
                            }
                            $output .= $prefix . $part . $suffix;
                        }
                        $hdr_value_out = $output;
                    } else {
                        $hdr_value_out = $prefix . $hdr_value_out . $suffix;
                    }
                    $hdr_value = $hdr_value_out;
                }
            }
            $input[$hdr_name] = $hdr_value;
        }
        return $input;
    }
Andreas Gehrs-Pahl
Re: Encodes a message header field as per RFC2047
on Wed, 12 Mar 2008 02:46:58 -0400
Trinkl,

>>LOCAL cSubject := "=?ISO-8859-2?B?" + ToBase64(cText) + "?="

>Thanks! This is very simple and easy to use. I might have known.

You are welcome!

>>LOCAL cSubject := "=?ISO-8859-2?Q?" + RFC2047_ToQP(cText) + "?="

>>Function RFC2047_ToQP(cText)
>>   cText := StrTran(ToQP(cText), "?", "=3F")
>>   cText := StrTran(cText, "_", "=5F")
>>return (StrTran(cText, " ", "_"))
 
>Subject containing 8-bit value characters, and may be very long
>(approx. 100 characters).
>Encoding is complicated task.
>I found a free PHP samle:

Actually, my code does virtually the same as your PHP example. The main 
difference is that the PHP code also handles the Header creation, for 
which we would use "oEMail:SetSubject(cSubject)" etc. instead. And if 
the ASINet Header routines don't handle wrapping of long header lines 
according to RFC-2047, then there is really nothing we can do about it,
no matter which encoding you use, "B" (base64) or "Q" (quoted-printable).

But the RFC2047_ToQP() function should work as well as the PHP example
or the "B" encoding!

-- Andreas

---                                                                      ---
  Andreas Gehrs-Pahl              E-Mail: GPahl@CharterMI.net
  415 Gute Street                     or: Andreas@DDPSoftware.com
  Owosso, MI 48867-4410               or: Andreas@Aerospace-History.net
  Tel: (989) 723-9927           Web Site: http://www.Aerospace-History.net
---                                                                      ---
Trinkl FerencRe: Encodes a message header field as per RFC2047
on Wed, 12 Mar 2008 12:09:33 +0100
"Andreas Gehrs-Pahl" <Andreas@DDPSoftware.com> wrote: 
1tme5bv7qza61.4lkd6d8vi4xl.dlg@40tude.net...

>>>LOCAL cSubject := "=?ISO-8859-2?Q?" + RFC2047_ToQP(cText) + "?="
>
>>>Function RFC2047_ToQP(cText)
>>>   cText := StrTran(ToQP(cText), "?", "=3F")
>>>   cText := StrTran(cText, "_", "=5F")
>>>return (StrTran(cText, " ", "_"))
>

ToQP() function converts "=" characters to "=3D", good. Also breaks long 
string with "=" character, badly.
ToQP() does not replace extended characters (\x80-xFF) with their ASCII 
values.
Thomas Braun
Re: Encodes a message header field as per RFC2047
on Wed, 12 Mar 2008 15:10:07 +0100
Trinkl Ferenc wrote:

> ToQP() function converts "=" characters to "=3D", good. Also breaks long 
> string with "=" character, badly.
> ToQP() does not replace extended characters (\x80-xFF) with their ASCII 
> values.

Try this one... but note, it also breaks long strings with "=" according to
RFC2047:

FUNCTION MimeQPEnc( cString )
LOCAL cHex, cChar
LOCAL cBuf   := SPACE(80)
LOCAL cRet   := ""
LOCAL nI     := 1
LOCAL nLen   := LEN(cString)
LOCAL nIndex := 1
LOCAL lFlag  := .F.

   DO WHILE nI <= nLen

      /* Get next single char */
      cChar := cString[nI]

      /* Check if printable (=ASCII-7) char between Space and "" =CHR(127) */
      IF ( cChar >= " "  .AND. ASC(cChar) <= 127 .AND. cChar != "=" .AND. ( cChar != "." .OR. cChar != CR ) ) .OR. ;
         cChar == CR .OR. cChar == TAB
         cBuf[nIndex++] := cChar
      ELSE
         /* Chars below " " and above CHR(127) are encoded as "=" + hex value */
         cHex           := EncodeHex(cChar)
         cBuf[nIndex++] := "="
         cBuf[nIndex++] := cHex[1]
         cBuf[nIndex++] := cHex[2]
      ENDIF

      /* carriage return normally should be followed by LF */
      IF cChar == CR

         nI++
         cChar          := cString[nI]   Get LF char
         cBuf[nIndex++] := cChar

         lFlag := .T.

      ELSEIF nIndex >= 73  /* Line buffer full, each QP line must not exceed 76 chars */

         cBuf[nIndex++] := "="
         cBuf[nIndex++] := CR
         cBuf[nIndex++] := LF

         lFlag := .T.

      ENDIF

      /* flush buffer to return string */
      IF lFlag
         lFlag  := .F.
         cRet   += SUBSTR(cBuf, 1, nIndex-1 )
         nIndex := 1
      ENDIF

      /* Next char */
      nI++

   ENDDO

RETURN cRet

FUNCTION EncodeHex(xChar)
LOCAL cRet := "", x
LOCAL cChars := "0123456789ABCDEF"
LOCAL l := 2
LOCAL n

   IF VALTYPE(xChar) = "N"
      n := xChar
   ELSE
      n := ASC(xChar)
   ENDIF

   DO WHILE ( n > 0 )                             while we got a number...
      x := n%16                                   get modulus
      n := Int(n/16)                              remove an exponent
      cRet := cChars[x+1] + cRet                  add result to return string
   ENDDO
   l := Max(l,Len(cRet))

RETURN PADL(cRet,l,"0")
Trinkl FerencRe: Encodes a message header field as per RFC2047
on Wed, 12 Mar 2008 16:29:19 +0100
"Thomas Braun" <spam@software-braun.de> wrote: > Trinkl Ferenc wrote:
>
>> ToQP() function converts "=" characters to "=3D", good. Also breaks long
>> string with "=" character, badly.
>> ToQP() does not replace extended characters (\x80-xFF) with their ASCII
>> values.
>
> Try this one... but note, it also breaks long strings with "=" according 
> to
> RFC2047:
>
> FUNCTION MimeQPEnc( cString )

This samle works fine.
Some mail server may send warning when message header containing extended 
8-bit characters. Your function meet the requirements.
Andreas Gehrs-Pahl
Re: Encodes a message header field as per RFC2047
on Thu, 13 Mar 2008 02:52:48 -0400
Trinkl,

>ToQP() function converts "=" characters to "=3D", good. Also breaks long 
>string with "=" character, badly.
>ToQP() does not replace extended characters (\x80-xFF) with their ASCII 
>values.

Actually, quoted-printable requires that those characters are encoded! 
So I just checked the ToQP() function myself, and found out the following
convoluted mess:

In Xbase++ 1.82.306 all characters are encoded correctly when they are in 
a string that is _long enough_, but with some minor exceptions:

* Character strings less than 5 or 6 characters are encoded as an empty 
  string!!! The exact length required depends on the string itself -- if 
  at least one character in the string needs to be encoded, the original 
  string must be at least 6 characters long, and if no character in the 
  string needs to be encoded, the original string must still be at least 
  5 characters long. This seems to be related to the following issue!
* Each encoded string has an unnecessary "=" + CR + LF appended to it 
  (unless it is encoded as an empty string)!!!
* LFs are not encoded but replaced by CR + LF -- see PDR 5361.
* TABs are not encoded at all, but should be encoded as "=09", at least 
  at the end of lines -- see PDR 5362.
* Spaces at the end of lines are not encoded (but should be encoded as 
  "=20" -- also covered by PDR 5362.
* Lines are wrapped correctly after 73 characters by adding "=" + CR + LF.

In Xbase++ 1.90.331, virtually NO characters are encoded at all anymore -- 
with some minor exceptions:

* TABs are always encoded as "=09" (fixes PDR 5362).
* Spaces at the end of lines are correctly encoded as "=20" (also fixes 
  PDR 5362).
* No unnecessary "=" + CR + LF is appended to the encoded strings anymore.
* Equal Signs are still correctly encoded as "=3D".
* No other characters besides TABs, Spaces, and Equal Signs are encoded at
  all anymore, which makes you wonder if this was worth fixing PDR 5362! 
* Lines are still wrapped correctly, but now after just 72 characters.

Xbase++ 1.90.331 will work like this with and without HotFix No. 7 applied!

Additionally, according to PDR 5605, all ASINet Utility function, including 
ToQP() will leak memory (at least in 1.90.331, but who knows, possibly also 
in 1.82.306 or 1.82.294).

I also found that in both version of Xbase++, you will get an IDSC if you 
pass a string to ToQP() by reference rather than by value.

So it looks like you are better off with using the code that Thomas posted,
as ASINet went from acceptable to completely useless, with regards to the 
ToQP() function, with the change from Version 1.82 to 1.90! Or you could 
downgrade to Xbase++ 1.82.306 -- but this would present actually many other 
problems, so isn't really an acceptable solution.

BTW, in Xbase++ 1.82.306, my little function works as expected. 

-- Andreas

---                                                                      ---
  Andreas Gehrs-Pahl              E-Mail: GPahl@CharterMI.net
  415 Gute Street                     or: Andreas@DDPSoftware.com
  Owosso, MI 48867-4410               or: Andreas@Aerospace-History.net
  Tel: (989) 723-9927           Web Site: http://www.Aerospace-History.net
---                                                                      ---