Author | Topic: Html encode to string | |
---|---|---|
Carlos | Html encode to string on Fri, 17 Apr 2020 11:26:59 +0100 Hi, Is there a Xbase++ function to convert html encoded strings to a xbase++ oem/ansi string, or who can share one? Example convert: Laboratório de Análises/PT To Laboratório de Análises/PT Currently i use Strtran() but is very ineficient because must be called many times. Many thanks, Carlos | |
Carlos | Re: Html encode to string on Sat, 18 Apr 2020 11:44:44 +0100 Às 11:26 de 17/04/2020, Carlos escreveu: > Hi, > > Is there a Xbase++ function to convert html encoded strings to a xbase++ > oem/ansi string, or who can share one? > > Example convert: > > Laboratório de Análises/PT > > To > > Laboratório de Análises/PT > > > Currently i use Strtran() but is very ineficient because must be called > many times. > > > Many thanks, > Carlos > FUNCTION HtmlDecode( cStr ) LOCAL n, cDec n := At( "&#", cStr ) WHILE n > 0 cDec := Substr( cStr, n+2, 4 ) 33; - 256; IF Val( cDec ) > 99 .AND. Substr( cDec, 4, 1 ) == ";" cStr := StrTran( cStr, "&#"+cDec, Chr(Val(cDec)) ) ELSEIF Val( cDec ) > 32 .AND. Substr( cDec, 3, 1 ) == ";" cStr := StrTran( cStr, "&#"+Substr(cDec,1,3), Chr(Val(cDec)) ) ENDIF n := At( "&#", cStr, n+1 ) ENDDO cStr := StrTran( cStr, "&", "e" ) cStr := StrTran( cStr, "<", "<" ) cStr := StrTran( cStr, ">", ">" ) cStr := StrTran( cStr, """, '"' ) IF At( "&", cStr ) > 0 cStr := StrTran( cStr, "acute;", "" ) cStr := StrTran( cStr, "cedil;", "" ) cStr := StrTran( cStr, "tilde;", "" ) cStr := StrTran( cStr, "circ;", "" ) cStr := StrTran( cStr, "grave;", "" ) cStr := StrTran( cStr, "&", "" ) ENDIF RETURN cStr Maybe wininet function below can be used to do the job... BOOLAPI InternetCanonicalizeUrlA( LPCSTR lpszUrl, LPSTR lpszBuffer, LPDWORD lpdwBufferLength, DWORD dwFlags ); | |
Andreas Gehrs-Pahl | Re: Html encode to string on Sun, 19 Apr 2020 22:40:43 -0400 Carlos, >Is there a Xbase++ function to convert html encoded strings to a xbase++ >oem/ansi string, or who can share one? There isn't such a function in Xbase++ (or the Windows API) or any of the add-on products (like Xb2Net, etc.) -- at least as far as I know. There are basically three possible ways HTML text might be escaped: 1) "&<Entity_Code>;" where <Entity_Code> is one of over 250 case sensitive codes, replacing those specific UTF-8 Unicode characters. 2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal Unicode code value for any UTF-8 Unicode character. 2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the hexadecimal Unicode code value for any UTF-8 Unicode character. So, first you need to convert all Entity and Numerical Codes to create a valid (UTF-8) Unicode string. Then you need to convert that string to either OEM or ANSI, depending on your environment. Also keep in mind that the two numeric encodings can have various length and might or might not include leading zeros. I have attached a demo program that does all of that. If you have an older version of Xbase++ (prior to 2.00.437) that doesn't contain the UTF82Char() function, you will need to use OT4Xb for the cUTF8toOEM() and cUTF8toANSI() functions instead. Hope this helps, Andreas Andreas Gehrs-Pahl Absolute Software, LLC phone: (989) 723-9927 email: Andreas@AbsoluteSoftwareLLC.com web: http://www.AbsoluteSoftwareLLC.com [F]: https://www.facebook.com/AbsoluteSoftwareLLC HTML_Decode.zip | |
Andreas Gehrs-Pahl | Re: Html encode to string on Mon, 20 Apr 2020 00:10:31 -0400 I wrote: >2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal Unicode code > value for any UTF-8 Unicode character. >2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the hexadecimal > Unicode code value for any UTF-8 Unicode character. Those "$" should have been "&", of course. Andreas Andreas Gehrs-Pahl Absolute Software, LLC phone: (989) 723-9927 email: Andreas@AbsoluteSoftwareLLC.com web: http://www.AbsoluteSoftwareLLC.com [F]: https://www.facebook.com/AbsoluteSoftwareLLC | |
Osvaldo Ramirez | Re: Html encode to string on Mon, 20 Apr 2020 09:39:29 -0600 On 19/04/20 22:10, Andreas Gehrs-Pahl wrote: > I wrote: > >> 2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal Unicode code >> value for any UTF-8 Unicode character. > >> 2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the hexadecimal >> Unicode code value for any UTF-8 Unicode character. > > Those "$" should have been "&", of course. > > Andreas > Thanks Andreas, We appreciate you help. Best Regards | |
Andreas Gehrs-Pahl | Re: Html encode to string on Tue, 21 Apr 2020 05:13:56 -0400 Osvaldo, >Thanks Andreas, We appreciate you help. I noted an oversight in the code. Please replace the first three lines of the program with the following three corrected lines: *************************************************************************** * #pragma Library("OT4Xb.lib") Use this for Xbase++ prior to 2.00.437 * *************************************************************************** Thanks, Andreas Andreas Gehrs-Pahl Absolute Software, LLC phone: (989) 723-9927 email: Andreas@AbsoluteSoftwareLLC.com web: http://www.AbsoluteSoftwareLLC.com [F]: https://www.facebook.com/AbsoluteSoftwareLLC | |
Carlos | Re: Html encode to string on Mon, 20 Apr 2020 10:38:53 +0100 Andreas, Great job, it is really what i need. Thank you very much. Carlos Às 03:40 de 20/04/2020, Andreas Gehrs-Pahl escreveu: > Carlos, > >> Is there a Xbase++ function to convert html encoded strings to a xbase++ >> oem/ansi string, or who can share one? > > There isn't such a function in Xbase++ (or the Windows API) or any of the > add-on products (like Xb2Net, etc.) -- at least as far as I know. > > There are basically three possible ways HTML text might be escaped: > > 1) "&<Entity_Code>;" where <Entity_Code> is one of over 250 case sensitive > codes, replacing those specific UTF-8 Unicode characters. > > 2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal Unicode code > value for any UTF-8 Unicode character. > > 2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the hexadecimal > Unicode code value for any UTF-8 Unicode character. > > So, first you need to convert all Entity and Numerical Codes to create a > valid (UTF-8) Unicode string. Then you need to convert that string to either > OEM or ANSI, depending on your environment. Also keep in mind that the two > numeric encodings can have various length and might or might not include > leading zeros. > > I have attached a demo program that does all of that. > > If you have an older version of Xbase++ (prior to 2.00.437) that doesn't > contain the UTF82Char() function, you will need to use OT4Xb for the > cUTF8toOEM() and cUTF8toANSI() functions instead. > > Hope this helps, > > Andreas > | |
Carlos | Re: Html encode to string on Mon, 27 Apr 2020 17:09:09 +0100 Hi Andreas, In some cases, and i can't know in advance, the returned string may be not html encoded and the character "&" is a valid character and ";" too. Windows-1252 strings can exists simultaneous with html enconded in the CSV file returned by other app. Example: Luís & Filhos, Lda. Luís &Filhos,Lda; When a space chr(32) follow "&" it will be easy but it is not granted. Any idea how can be changed HTML_UnEscape()... or is an impossible task? Thanks, Carlos Às 10:38 de 20/04/2020, Carlos escreveu: > Andreas, > > Great job, it is really what i need. > > Thank you very much. > Carlos > > > > Às 03:40 de 20/04/2020, Andreas Gehrs-Pahl escreveu: >> Carlos, >> >>> Is there a Xbase++ function to convert html encoded strings to a xbase++ >>> oem/ansi string, or who can share one? >> >> There isn't such a function in Xbase++ (or the Windows API) or any of the >> add-on products (like Xb2Net, etc.) -- at least as far as I know. >> >> There are basically three possible ways HTML text might be escaped: >> >> 1) "&<Entity_Code>;" where <Entity_Code> is one of over 250 case >> sensitive >> codes, replacing those specific UTF-8 Unicode characters. >> >> 2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal >> Unicode code >> value for any UTF-8 Unicode character. >> >> 2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the >> hexadecimal >> Unicode code value for any UTF-8 Unicode character. >> >> So, first you need to convert all Entity and Numerical Codes to create a >> valid (UTF-8) Unicode string. Then you need to convert that string to >> either >> OEM or ANSI, depending on your environment. Also keep in mind that the >> two >> numeric encodings can have various length and might or might not include >> leading zeros. >> >> I have attached a demo program that does all of that. >> >> If you have an older version of Xbase++ (prior to 2.00.437) that doesn't >> contain the UTF82Char() function, you will need to use OT4Xb for the >> cUTF8toOEM() and cUTF8toANSI() functions instead. >> >> Hope this helps, >> >> Andreas >> > | |
Andreas Gehrs-Pahl | Re: Html encode to string on Mon, 27 Apr 2020 21:11:11 -0400 Carlos, >Windows-1252 strings can exists simultaneous with html enconded in the >CSV file returned by other app. HTML-encoded text should only exist within <HTML> and </HTML> tags. Otherwise, you can only guess if the text is HTML-encoded or not. >Luís & Filhos, Lda. >Luís &Filhos,Lda; >Any idea how can be changed HTML_UnEscape()... or is an impossible task? It's not an impossible task, but no matter how sophisticated your code is, you might always get false positives or negatives, if you just guess. My suggestion would be to simply change lines 386/387 of the demo to this: LOCAL cText := iif(nFound == 0, "&" + cEscape + ";", saEscape[nFound, 2]) return (UTF8_to_ASCII(cText)) That way your new examples should work just fine. Andreas Andreas Gehrs-Pahl Absolute Software, LLC phone: (989) 723-9927 email: Andreas@AbsoluteSoftwareLLC.com web: http://www.AbsoluteSoftwareLLC.com [F]: https://www.facebook.com/AbsoluteSoftwareLLC |