Alaska Software Inc. - Html encode to string
Username: Password:
AuthorTopic: Html encode to string
Carlos Html encode to string
on Fri, 17 Apr 2020 11:26:59 +0100
Hi,

Is there a Xbase++ function to convert html encoded strings to a xbase++ 
oem/ansi string, or who can share one?

Example convert:

Laboratório de Análises/PT

To

Laboratório de Análises/PT


Currently i use Strtran() but is very ineficient because must be called 
many times.


Many thanks,
Carlos
Carlos Re: Html encode to string
on Sat, 18 Apr 2020 11:44:44 +0100
Às 11:26 de 17/04/2020, Carlos escreveu:
> Hi,
> 
> Is there a Xbase++ function to convert html encoded strings to a xbase++ 
> oem/ansi string, or who can share one?
> 
> Example convert:
> 
> Laboratório de Análises/PT
> 
> To
> 
> Laboratório de Análises/PT
> 
> 
> Currently i use Strtran() but is very ineficient because must be called 
> many times.
> 
> 
> Many thanks,
> Carlos
> 

FUNCTION HtmlDecode( cStr )
    LOCAL n, cDec
    n := At( "&#", cStr )
    WHILE n > 0
       cDec := Substr( cStr, n+2, 4 )     33; - 256;
       IF Val( cDec ) > 99 .AND. Substr( cDec, 4, 1 ) == ";"
          cStr := StrTran( cStr, "&#"+cDec, Chr(Val(cDec)) )
       ELSEIF Val( cDec ) > 32  .AND. Substr( cDec, 3, 1 ) == ";"
          cStr := StrTran( cStr, "&#"+Substr(cDec,1,3), Chr(Val(cDec)) )
       ENDIF
       n := At( "&#", cStr, n+1 )
    ENDDO
    cStr := StrTran( cStr, "&",  "e" )
    cStr := StrTran( cStr, "&lt;",   "<" )
    cStr := StrTran( cStr, "&gt;",   ">" )
    cStr := StrTran( cStr, "&quot;", '"' )
    IF At( "&", cStr ) > 0
       cStr := StrTran( cStr, "acute;", "" )
       cStr := StrTran( cStr, "cedil;", "" )
       cStr := StrTran( cStr, "tilde;", "" )
       cStr := StrTran( cStr, "circ;",  "" )
       cStr := StrTran( cStr, "grave;", "" )
       cStr := StrTran( cStr, "&", "" )
    ENDIF
RETURN cStr


Maybe wininet function below can be used to do the job...

BOOLAPI InternetCanonicalizeUrlA(
   LPCSTR  lpszUrl,
   LPSTR   lpszBuffer,
   LPDWORD lpdwBufferLength,
   DWORD   dwFlags
);
Andreas Gehrs-Pahl
Re: Html encode to string
on Sun, 19 Apr 2020 22:40:43 -0400
Carlos,

>Is there a Xbase++ function to convert html encoded strings to a xbase++ 
>oem/ansi string, or who can share one?

There isn't such a function in Xbase++ (or the Windows API) or any of the 
add-on products (like Xb2Net, etc.) -- at least as far as I know.

There are basically three possible ways HTML text might be escaped:

1) "&<Entity_Code>;" where <Entity_Code> is one of over 250 case sensitive 
   codes, replacing those specific UTF-8 Unicode characters.

2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal Unicode code 
    value for any UTF-8 Unicode character.

2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the hexadecimal 
    Unicode code value for any UTF-8 Unicode character.

So, first you need to convert all Entity and Numerical Codes to create a 
valid (UTF-8) Unicode string. Then you need to convert that string to either 
OEM or ANSI, depending on your environment. Also keep in mind that the two 
numeric encodings can have various length and might or might not include 
leading zeros.

I have attached a demo program that does all of that.

If you have an older version of Xbase++ (prior to 2.00.437) that doesn't 
contain the UTF82Char() function, you will need to use OT4Xb for the 
cUTF8toOEM() and cUTF8toANSI() functions instead.

Hope this helps,

Andreas

Andreas Gehrs-Pahl
Absolute Software, LLC

phone: (989) 723-9927
email: Andreas@AbsoluteSoftwareLLC.com
web:   http://www.AbsoluteSoftwareLLC.com
[F]:   https://www.facebook.com/AbsoluteSoftwareLLC

HTML_Decode.zip
Andreas Gehrs-Pahl
Re: Html encode to string
on Mon, 20 Apr 2020 00:10:31 -0400
I wrote:

>2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal Unicode code 
>    value for any UTF-8 Unicode character.

>2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the hexadecimal 
>    Unicode code value for any UTF-8 Unicode character.

Those "$" should have been "&", of course.

Andreas

Andreas Gehrs-Pahl
Absolute Software, LLC

phone: (989) 723-9927
email: Andreas@AbsoluteSoftwareLLC.com
web:   http://www.AbsoluteSoftwareLLC.com
[F]:   https://www.facebook.com/AbsoluteSoftwareLLC
Osvaldo RamirezRe: Html encode to string
on Mon, 20 Apr 2020 09:39:29 -0600
On 19/04/20 22:10, Andreas Gehrs-Pahl wrote:
> I wrote:
> 
>> 2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal Unicode code
>>     value for any UTF-8 Unicode character.
> 
>> 2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the hexadecimal
>>     Unicode code value for any UTF-8 Unicode character.
> 
> Those "$" should have been "&", of course.
> 
> Andreas
> 
Thanks Andreas, We appreciate you help.
Best Regards
Andreas Gehrs-Pahl
Re: Html encode to string
on Tue, 21 Apr 2020 05:13:56 -0400
Osvaldo,

>Thanks Andreas, We appreciate you help.

I noted an oversight in the code. Please replace the first three lines of 
the program with the following three corrected lines:

***************************************************************************
* #pragma Library("OT4Xb.lib")	 Use this for Xbase++ prior to 2.00.437 *
***************************************************************************

Thanks,

Andreas

Andreas Gehrs-Pahl
Absolute Software, LLC

phone: (989) 723-9927
email: Andreas@AbsoluteSoftwareLLC.com
web:   http://www.AbsoluteSoftwareLLC.com
[F]:   https://www.facebook.com/AbsoluteSoftwareLLC
Carlos Re: Html encode to string
on Mon, 20 Apr 2020 10:38:53 +0100
Andreas,

Great job, it is really what i need.

Thank you very much.
Carlos



Às 03:40 de 20/04/2020, Andreas Gehrs-Pahl escreveu:
> Carlos,
> 
>> Is there a Xbase++ function to convert html encoded strings to a xbase++
>> oem/ansi string, or who can share one?
> 
> There isn't such a function in Xbase++ (or the Windows API) or any of the
> add-on products (like Xb2Net, etc.) -- at least as far as I know.
> 
> There are basically three possible ways HTML text might be escaped:
> 
> 1) "&<Entity_Code>;" where <Entity_Code> is one of over 250 case sensitive
>     codes, replacing those specific UTF-8 Unicode characters.
> 
> 2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal Unicode code
>      value for any UTF-8 Unicode character.
> 
> 2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the hexadecimal
>      Unicode code value for any UTF-8 Unicode character.
> 
> So, first you need to convert all Entity and Numerical Codes to create a
> valid (UTF-8) Unicode string. Then you need to convert that string to either
> OEM or ANSI, depending on your environment. Also keep in mind that the two
> numeric encodings can have various length and might or might not include
> leading zeros.
> 
> I have attached a demo program that does all of that.
> 
> If you have an older version of Xbase++ (prior to 2.00.437) that doesn't
> contain the UTF82Char() function, you will need to use OT4Xb for the
> cUTF8toOEM() and cUTF8toANSI() functions instead.
> 
> Hope this helps,
> 
> Andreas
>
Carlos Re: Html encode to string
on Mon, 27 Apr 2020 17:09:09 +0100
Hi Andreas,

In some cases, and i can't know in advance, the returned string may be 
not html encoded and the character "&" is a valid character and ";" too.

Windows-1252 strings can exists simultaneous with html enconded in the 
CSV file returned by other app.

Example:
Luís & Filhos, Lda.
Luís &Filhos,Lda;

When a space chr(32) follow "&" it will be easy but it is not granted.
Any idea how can be changed HTML_UnEscape()... or is an impossible task?

Thanks,
Carlos





Às 10:38 de 20/04/2020, Carlos escreveu:
> Andreas,
> 
> Great job, it is really what i need.
> 
> Thank you very much.
> Carlos
> 
> 
> 
> Às 03:40 de 20/04/2020, Andreas Gehrs-Pahl escreveu:
>> Carlos,
>>
>>> Is there a Xbase++ function to convert html encoded strings to a xbase++
>>> oem/ansi string, or who can share one?
>>
>> There isn't such a function in Xbase++ (or the Windows API) or any of the
>> add-on products (like Xb2Net, etc.) -- at least as far as I know.
>>
>> There are basically three possible ways HTML text might be escaped:
>>
>> 1) "&<Entity_Code>;" where <Entity_Code> is one of over 250 case 
>> sensitive
>>     codes, replacing those specific UTF-8 Unicode characters.
>>
>> 2a) "$#<Decimal_Number>;" where <Decimal_Number> is the decimal 
>> Unicode code
>>      value for any UTF-8 Unicode character.
>>
>> 2b) "$#x<Hexadecimal_Number>;" where <Hexadecimal_Number> is the 
>> hexadecimal
>>      Unicode code value for any UTF-8 Unicode character.
>>
>> So, first you need to convert all Entity and Numerical Codes to create a
>> valid (UTF-8) Unicode string. Then you need to convert that string to 
>> either
>> OEM or ANSI, depending on your environment. Also keep in mind that the 
>> two
>> numeric encodings can have various length and might or might not include
>> leading zeros.
>>
>> I have attached a demo program that does all of that.
>>
>> If you have an older version of Xbase++ (prior to 2.00.437) that doesn't
>> contain the UTF82Char() function, you will need to use OT4Xb for the
>> cUTF8toOEM() and cUTF8toANSI() functions instead.
>>
>> Hope this helps,
>>
>> Andreas
>>
>
Andreas Gehrs-Pahl
Re: Html encode to string
on Mon, 27 Apr 2020 21:11:11 -0400
Carlos,

>Windows-1252 strings can exists simultaneous with html enconded in the 
>CSV file returned by other app.

HTML-encoded text should only exist within <HTML> and </HTML> tags. 
Otherwise, you can only guess if the text is HTML-encoded or not.

>Luís & Filhos, Lda.
>Luís &Filhos,Lda;

>Any idea how can be changed HTML_UnEscape()... or is an impossible task?

It's not an impossible task, but no matter how sophisticated your code is, 
you might always get false positives or negatives, if you just guess.

My suggestion would be to simply change lines 386/387 of the demo to this:

LOCAL cText  := iif(nFound == 0, "&" + cEscape + ";", saEscape[nFound, 2])
return (UTF8_to_ASCII(cText))

That way your new examples should work just fine.

Andreas

Andreas Gehrs-Pahl
Absolute Software, LLC

phone: (989) 723-9927
email: Andreas@AbsoluteSoftwareLLC.com
web:   http://www.AbsoluteSoftwareLLC.com
[F]:   https://www.facebook.com/AbsoluteSoftwareLLC