Alaska Software Inc. - Encodes a message header field as per RFC2047

	Author	Topic: Encodes a message header field as per RFC2047
	Trinkl Ferenc	Encodes a message header field as per RFC2047 on Wed, 05 Mar 2008 13:39:42 +0100 Hi, Does anyone have solution to encoding message header field (subject) into 8859-2 character set?
	Andreas Gehrs-Pahl	Re: Encodes a message header field as per RFC2047 on Fri, 07 Mar 2008 04:34:57 -0500 Trinkl, >Does anyone have solution to encoding message header field (subject) >into 8859-2 character set? I haven't actually ever done this, but the following should work, assuming that the variable 'cText' contains your Subject line that uses ISO-8859-2 characters. LOCAL cSubject := "=?ISO-8859-2?B?" + ToBase64(cText) + "?=" LOCAL oEMail := MIMEMessage():New() [...] oEMail:SetSubject(cSubject) [...] If you prefer to use the "Q" encoding -- which by default is a little more readable if the encoding is not handled by the receiver -- you could try the following, but it might not come out quite right, in case there are any special characters contained in "cText" that are treated differently by any of the forwarding internet servers! LOCAL cSubject := "=?ISO-8859-2?Q?" + RFC2047_ToQP(cText) + "?=" Function RFC2047_ToQP(cText) cText := StrTran(ToQP(cText), "?", "=3F") cText := StrTran(cText, "_", "=5F") return (StrTran(cText, " ", "_")) Let us know if it works, -- Andreas --- --- Andreas Gehrs-Pahl E-Mail: GPahl@CharterMI.net 415 Gute Street or: Andreas@DDPSoftware.com Owosso, MI 48867-4410 or: Andreas@Aerospace-History.net Tel: (989) 723-9927 Web Site: http://www.Aerospace-History.net --- ---
	Trinkl Ferenc	Re: Encodes a message header field as per RFC2047 on Fri, 07 Mar 2008 22:51:15 +0100 ----- Original Message ----- From: "Andreas Gehrs-Pahl" <Andreas@DDPSoftware.com> Newsgroups: public.xbase++.asinet Sent: Friday, March 07, 2008 10:34 AM Subject: Re: Encodes a message header field as per RFC2047 >>Does anyone have solution to encoding message header field (subject) >>into 8859-2 character set? > > I haven't actually ever done this, but the following should work, assuming > that the variable 'cText' contains your Subject line that uses ISO-8859-2 > characters. > > LOCAL cSubject := "=?ISO-8859-2?B?" + ToBase64(cText) + "?=" Thanks! This is very simple and easy to use. I might have known. > If you prefer to use the "Q" encoding -- which by default is a little more > readable if the encoding is not handled by the receiver -- you could try > the following, but it might not come out quite right, in case there are > any special characters contained in "cText" that are treated differently > by any of the forwarding internet servers! > > LOCAL cSubject := "=?ISO-8859-2?Q?" + RFC2047_ToQP(cText) + "?=" Subject containing 8-bit value characters, and may be very long (approx. 100 characters). Encoding is complicated task. I found a free PHP samle: / Encodes a header as per RFC2047** * @param array $input The header data to encode * @param array $params Extra build parameters @return array Encoded data * @access private */ function _encodeHeaders($input, $params = array()) { $build_params = $this->_build_params; while (list($key, $value) = each($params)) { $build_params[$key] = $value; } //$hdr_name: Name of the heaer //$hdr_value: Full line of header value. //$hdr_value_out: The recombined $hdr_val-atoms, or the encoded string. $useIconv = true; if (isset($build_params['ignore-iconv'])){ $useIconv = !$build_params['ignore-iconv']){ } foreach ($input as $hdr_name => $hdr_value) { if (preg_match('#([\x80-\xFF]){1}#', $hdr_value)) { if (function_exists('iconv_mime_encode') && $useIconv) { $imePrefs = array(); if ($build_params['head_encoding'] == 'base64') { $imePrefs['scheme'] = 'B'; } else { $imePrefs['scheme'] = 'Q'; } $imePrefs['input-charset'] = $build_params['head_charset']; $imePrefs['output-charset'] = $build_params['head_charset']; $imePrefs['line-length'] = 74; $imePrefs['line-break-chars'] = "\r\n"; //Specified in RFC2047 $hdr_value = iconv_mime_encode($hdr_name, $hdr_value, $imePrefs); $hdr_value = preg_replace("#^{$hdr_name}\:\ #", "", $hdr_value); } elseif ($build_params['head_encoding'] == 'base64') { //Base64 encoding has been selected. //Base64 encode the entire string $hdr_value = base64_encode($hdr_value); //Generate the header using the specified params and dynamicly //determine the maximum length of such strings. //75 is the value specified in the RFC. The first -2 is there so //the later regexp doesn't break any of the translated chars. //The -2 on the first line-regexp is to compensate for the ": " //between the header-name and the header value $prefix = '=?' . $build_params['head_charset'] . '?B?'; $suffix = '?='; $maxLength = 75 - strlen($prefix . $suffix) - 2; $maxLength1stLine = $maxLength - strlen($hdr_name) - 2; //We can cut base4 every 4 characters, so the real max we can get must be rounded down. $maxLength = $maxLength - ($maxLength % 4); $maxLength1stLine = $maxLength1stLine - ($maxLength1stLine % 4); $cutpoint = $maxLength1stLine; $hdr_value_out = $hdr_value; $output = ""; while ($hdr_value_out) { //Split translated string at every $maxLength $part = substr($hdr_value_out, 0, $cutpoint); $hdr_value_out = substr($hdr_value_out, $cutpoint); $cutpoint = $maxLength; //RFC 2047 specifies that any split header should be seperated //by a CRLF SPACE. if ($output) { $output .= "\r\n "; } $output .= $prefix . $part . $suffix; } $hdr_value = $output; } else { //quoted-printable encoding has been selected //Fix for Bug #10298, Ota Mares <om@viazenetti.de> //Check if there is a double quote at beginning or end of the string to //prevent that an open or closing quote gets ignored because its encapsuled //by an encoding prefix or suffix. //Remove the double quote and set the specific prefix or suffix variable //so later we can concat the encoded string and the double quotes back //together to get the intended string. $quotePrefix = $quoteSuffix = ''; if ($hdr_value{0} == '"') { $hdr_value = substr($hdr_value, 1); $quotePrefix = '"'; } if ($hdr_value{strlen($hdr_value)-1} == '"') { $hdr_value = substr($hdr_value, 0, -1); $quoteSuffix = '"'; } //Generate the header using the specified params and dynamicly //determine the maximum length of such strings. //75 is the value specified in the RFC. The -2 is there so //the later regexp doesn't break any of the translated chars. //The -2 on the first line-regexp is to compensate for the ": " //between the header-name and the header value $prefix = '=?' . $build_params['head_charset'] . '?Q?'; $suffix = '?='; $maxLength = 75 - strlen($prefix . $suffix) - 2 - 1; $maxLength1stLine = $maxLength - strlen($hdr_name) - 2; $maxLength = $maxLength - 1; //Replace all special characters used by the encoder. $search = array('=', '_', '?', ' '); $replace = array('=3D', '=5F', '=3F', '_'); $hdr_value = str_replace($search, $replace, $hdr_value); //Replace all extended characters (\x80-xFF) with their //ASCII values. $hdr_value = preg_replace('#([\x80-\xFF])#e', '"=" . strtoupper(dechex(ord("\1")))', $hdr_value); //This regexp will break QP-encoded text at every $maxLength //but will not break any encoded letters. $reg1st = "\|(.{0,$maxLength1stLine}[^\=][^\=])\|"; $reg2nd = "\|(.{0,$maxLength}[^\=][^\=])\|"; //Fix for Bug #10298, Ota Mares <om@viazenetti.de> //Concat the double quotes if existant and encoded string together $hdr_value = $quotePrefix . $hdr_value . $quoteSuffix; $hdr_value_out = $hdr_value; if (strlen($hdr_value_out) >= ($maxLength1stLine + strlen($prefix . $suffix))) { //Begin with the regexp for the first line. $reg = $reg1st; $output = ""; while ($hdr_value_out) { //Split translated string at every $maxLength //But make sure not to break any translated chars. $found = preg_match($reg, $hdr_value_out, $matches); //After this first line, we need to use a different //regexp for the first line. $reg = $reg2nd; //Save the found part and encapsulate it in the //prefix & suffix. Then remove the part from the //$hdr_value_out variable. if ($found) { $part = $matches[0]; $hdr_value_out = substr($hdr_value_out, strlen($matches[0])); } else { $part = $hdr_value_out; $hdr_value_out = ""; } //RFC 2047 specifies that any split header should be seperated //by a CRLF SPACE. if ($output) { $output .= "\r\n "; } $output .= $prefix . $part . $suffix; } $hdr_value_out = $output; } else { $hdr_value_out = $prefix . $hdr_value_out . $suffix; } $hdr_value = $hdr_value_out; } } $input[$hdr_name] = $hdr_value; } return $input; }
	Andreas Gehrs-Pahl	Re: Encodes a message header field as per RFC2047 on Wed, 12 Mar 2008 02:46:58 -0400 Trinkl, >>LOCAL cSubject := "=?ISO-8859-2?B?" + ToBase64(cText) + "?=" >Thanks! This is very simple and easy to use. I might have known. You are welcome! >>LOCAL cSubject := "=?ISO-8859-2?Q?" + RFC2047_ToQP(cText) + "?=" >>Function RFC2047_ToQP(cText) >> cText := StrTran(ToQP(cText), "?", "=3F") >> cText := StrTran(cText, "_", "=5F") >>return (StrTran(cText, " ", "_")) >Subject containing 8-bit value characters, and may be very long >(approx. 100 characters). >Encoding is complicated task. >I found a free PHP samle: Actually, my code does virtually the same as your PHP example. The main difference is that the PHP code also handles the Header creation, for which we would use "oEMail:SetSubject(cSubject)" etc. instead. And if the ASINet Header routines don't handle wrapping of long header lines according to RFC-2047, then there is really nothing we can do about it, no matter which encoding you use, "B" (base64) or "Q" (quoted-printable). But the RFC2047_ToQP() function should work as well as the PHP example or the "B" encoding! -- Andreas --- --- Andreas Gehrs-Pahl E-Mail: GPahl@CharterMI.net 415 Gute Street or: Andreas@DDPSoftware.com Owosso, MI 48867-4410 or: Andreas@Aerospace-History.net Tel: (989) 723-9927 Web Site: http://www.Aerospace-History.net --- ---
	Trinkl Ferenc	Re: Encodes a message header field as per RFC2047 on Wed, 12 Mar 2008 12:09:33 +0100 "Andreas Gehrs-Pahl" <Andreas@DDPSoftware.com> wrote: 1tme5bv7qza61.4lkd6d8vi4xl.dlg@40tude.net... >>>LOCAL cSubject := "=?ISO-8859-2?Q?" + RFC2047_ToQP(cText) + "?=" > >>>Function RFC2047_ToQP(cText) >>> cText := StrTran(ToQP(cText), "?", "=3F") >>> cText := StrTran(cText, "_", "=5F") >>>return (StrTran(cText, " ", "_")) > ToQP() function converts "=" characters to "=3D", good. Also breaks long string with "=" character, badly. ToQP() does not replace extended characters (\x80-xFF) with their ASCII values.
	Thomas Braun	Re: Encodes a message header field as per RFC2047 on Wed, 12 Mar 2008 15:10:07 +0100 Trinkl Ferenc wrote: > ToQP() function converts "=" characters to "=3D", good. Also breaks long > string with "=" character, badly. > ToQP() does not replace extended characters (\x80-xFF) with their ASCII > values. Try this one... but note, it also breaks long strings with "=" according to RFC2047: FUNCTION MimeQPEnc( cString ) LOCAL cHex, cChar LOCAL cBuf := SPACE(80) LOCAL cRet := "" LOCAL nI := 1 LOCAL nLen := LEN(cString) LOCAL nIndex := 1 LOCAL lFlag := .F. DO WHILE nI <= nLen /* Get next single char / cChar := cString[nI] / Check if printable (=ASCII-7) char between Space and "" =CHR(127) / IF ( cChar >= " " .AND. ASC(cChar) <= 127 .AND. cChar != "=" .AND. ( cChar != "." .OR. cChar != CR ) ) .OR. ; cChar == CR .OR. cChar == TAB cBuf[nIndex++] := cChar ELSE / Chars below " " and above CHR(127) are encoded as "=" + hex value / cHex := EncodeHex(cChar) cBuf[nIndex++] := "=" cBuf[nIndex++] := cHex[1] cBuf[nIndex++] := cHex[2] ENDIF / carriage return normally should be followed by LF / IF cChar == CR nI++ cChar := cString[nI] Get LF char cBuf[nIndex++] := cChar lFlag := .T. ELSEIF nIndex >= 73 / Line buffer full, each QP line must not exceed 76 chars / cBuf[nIndex++] := "=" cBuf[nIndex++] := CR cBuf[nIndex++] := LF lFlag := .T. ENDIF / flush buffer to return string / IF lFlag lFlag := .F. cRet += SUBSTR(cBuf, 1, nIndex-1 ) nIndex := 1 ENDIF / Next char */ nI++ ENDDO RETURN cRet FUNCTION EncodeHex(xChar) LOCAL cRet := "", x LOCAL cChars := "0123456789ABCDEF" LOCAL l := 2 LOCAL n IF VALTYPE(xChar) = "N" n := xChar ELSE n := ASC(xChar) ENDIF DO WHILE ( n > 0 ) while we got a number... x := n%16 get modulus n := Int(n/16) remove an exponent cRet := cChars[x+1] + cRet add result to return string ENDDO l := Max(l,Len(cRet)) RETURN PADL(cRet,l,"0")
	Trinkl Ferenc	Re: Encodes a message header field as per RFC2047 on Wed, 12 Mar 2008 16:29:19 +0100 "Thomas Braun" <spam@software-braun.de> wrote: > Trinkl Ferenc wrote: > >> ToQP() function converts "=" characters to "=3D", good. Also breaks long >> string with "=" character, badly. >> ToQP() does not replace extended characters (\x80-xFF) with their ASCII >> values. > > Try this one... but note, it also breaks long strings with "=" according > to > RFC2047: > > FUNCTION MimeQPEnc( cString ) This samle works fine. Some mail server may send warning when message header containing extended 8-bit characters. Your function meet the requirements.
	Andreas Gehrs-Pahl	Re: Encodes a message header field as per RFC2047 on Thu, 13 Mar 2008 02:52:48 -0400 Trinkl, >ToQP() function converts "=" characters to "=3D", good. Also breaks long >string with "=" character, badly. >ToQP() does not replace extended characters (\x80-xFF) with their ASCII >values. Actually, quoted-printable requires that those characters are encoded! So I just checked the ToQP() function myself, and found out the following convoluted mess: In Xbase++ 1.82.306 all characters are encoded correctly when they are in a string that is _long enough_, but with some minor exceptions: * Character strings less than 5 or 6 characters are encoded as an empty string!!! The exact length required depends on the string itself -- if at least one character in the string needs to be encoded, the original string must be at least 6 characters long, and if no character in the string needs to be encoded, the original string must still be at least 5 characters long. This seems to be related to the following issue! * Each encoded string has an unnecessary "=" + CR + LF appended to it (unless it is encoded as an empty string)!!! * LFs are not encoded but replaced by CR + LF -- see PDR 5361. * TABs are not encoded at all, but should be encoded as "=09", at least at the end of lines -- see PDR 5362. * Spaces at the end of lines are not encoded (but should be encoded as "=20" -- also covered by PDR 5362. * Lines are wrapped correctly after 73 characters by adding "=" + CR + LF. In Xbase++ 1.90.331, virtually NO characters are encoded at all anymore -- with some minor exceptions: * TABs are always encoded as "=09" (fixes PDR 5362). * Spaces at the end of lines are correctly encoded as "=20" (also fixes PDR 5362). * No unnecessary "=" + CR + LF is appended to the encoded strings anymore. * Equal Signs are still correctly encoded as "=3D". * No other characters besides TABs, Spaces, and Equal Signs are encoded at all anymore, which makes you wonder if this was worth fixing PDR 5362! * Lines are still wrapped correctly, but now after just 72 characters. Xbase++ 1.90.331 will work like this with and without HotFix No. 7 applied! Additionally, according to PDR 5605, all ASINet Utility function, including ToQP() will leak memory (at least in 1.90.331, but who knows, possibly also in 1.82.306 or 1.82.294). I also found that in both version of Xbase++, you will get an IDSC if you pass a string to ToQP() by reference rather than by value. So it looks like you are better off with using the code that Thomas posted, as ASINet went from acceptable to completely useless, with regards to the ToQP() function, with the change from Version 1.82 to 1.90! Or you could downgrade to Xbase++ 1.82.306 -- but this would present actually many other problems, so isn't really an acceptable solution. BTW, in Xbase++ 1.82.306, my little function works as expected. -- Andreas --- --- Andreas Gehrs-Pahl E-Mail: GPahl@CharterMI.net 415 Gute Street or: Andreas@DDPSoftware.com Owosso, MI 48867-4410 or: Andreas@Aerospace-History.net Tel: (989) 723-9927 Web Site: http://www.Aerospace-History.net --- ---