Alaska Software Inc. - Webforms, WAA and character sets
Username: Password:
AuthorTopic: Webforms, WAA and character sets
Thomas Braun
Webforms, WAA and character sets
on Fri, 18 Sep 2009 12:41:42 +0200
Hi,

i'm using WAA to run a delegate registration site for large international
conferences. So far there have been no big problems with language specifc
characters of character sets.

But now we keep getting more an more registrations from central europe and
the baltics... which creates the situation that we have a mix of ISO 8859-1
and ISO 8859-2 in the same database.

(probably) depending on browser or OS settings, either wrong characters, or
HTML references like ł keep ending up in the databases, creating
problems when printing invoices, name badges and various other documents.

I suppose unless Xbase++ is able to deal with unicode and the various
encodings, I'm lost... or does anyone know a different solution?

thx | regards
Thomas
Thomas Braun
Re: Webforms, WAA and character sets
on Thu, 29 Oct 2009 13:11:34 +0100
Thomas Braun wrote:

> i'm using WAA to run a delegate registration site for large international
> conferences. So far there have been no big problems with language specifc
> characters of character sets.
> 
> But now we keep getting more an more registrations from central europe and
> the baltics... which creates the situation that we have a mix of ISO 8859-1
> and ISO 8859-2 in the same database.
> 
> (probably) depending on browser or OS settings, either wrong characters, or
> HTML references like ł keep ending up in the databases, creating
> problems when printing invoices, name badges and various other documents.
> 
> I suppose unless Xbase++ is able to deal with unicode and the various
> encodings, I'm lost... or does anyone know a different solution?

push

As far as I could find out now I will have to use unicode/UTF-Encoding
throughout the whole process - which opens up a bunch of problems I can not
solve by using Xbase++ sigh

Thomas
Andreas HerdtRe: Webforms, WAA and character sets
on Thu, 29 Oct 2009 19:27:26 +0100
Hi Thomas,

It is true that the character strings that are received from the
web server are forwarded to the WAA via the gateway as they are.
Currently there is not transformation at all.

The gateway is a CGI/Isapi module that uses the corresponding
interfaces. Can you please describe in your words where
the gap in WWW-Gateway-Waa interaction is. I would assume that
we could provide the information about the Accept-Language of the
request in the From function via the Html object.

What do you mean with "opens up a bunch of problems". Where do you
feel that further processing can not be done with provided that the
above described knowledge about the Accept-Language of the original
Http-Request is given.

Thomas Braun schrieb:
> Thomas Braun wrote:
> 
>> i'm using WAA to run a delegate registration site for large international
>> conferences. So far there have been no big problems with language specifc
>> characters of character sets.
>>
>> But now we keep getting more an more registrations from central europe and
>> the baltics... which creates the situation that we have a mix of ISO 8859-1
>> and ISO 8859-2 in the same database.
>>
>> (probably) depending on browser or OS settings, either wrong characters, or
>> HTML references like ł keep ending up in the databases, creating
>> problems when printing invoices, name badges and various other documents.
>>
>> I suppose unless Xbase++ is able to deal with unicode and the various
>> encodings, I'm lost... or does anyone know a different solution?
> 
> push
> 
> As far as I could find out now I will have to use unicode/UTF-Encoding
> throughout the whole process - which opens up a bunch of problems I can not
> solve by using Xbase++ sigh
> 
> Thomas


   Andreas Herdt
   Alaska Software

--------------------------------------------------------------------

Technical Support:      support@alaska-software.com

News Server:            news.alaska-software.com
Homepage:               http://www.alaska-software.com
WebKnowledgeBase:       http://www.alaska-software.com/kbase.shtm

Fax European Office:    +49 (0) 61 96 - 77 99 99 23
Fax US Office:          +1 (646) 218 1281
--------------------------------------------------------------------
Thomas Braun
Re: Webforms, WAA and character sets
on Fri, 30 Oct 2009 10:24:51 +0100
Andreas  Herdt wrote:

> It is true that the character strings that are received from the
> web server are forwarded to the WAA via the gateway as they are.
> Currently there is not transformation at all.

That is perfectly OK with me... anything else would only open up new
chances for bugs to show up 

> The gateway is a CGI/Isapi module that uses the corresponding
> interfaces. Can you please describe in your words where
> the gap in WWW-Gateway-Waa interaction is. 

There is no gap - the basic problem is that Xbase++ is unicode-illiterate.

I could use the accept-charset attribute of the form tag to receive UTF-8
encoded characters:

<form action="..." method="post" accept-charset="UTF-8">

But then I will get a lot of new problems (or challenges), since Xbase++
string functions like len(), substr(), at() etc. are not aware of UTF-8
encoded strings (AFAIK) and would give wrong results. 

Maybe I could create my own versions of all of the string functions to
handle UTF-8 encoded string correctly... but currently I simply do not have
the time to do this. In addition to this, it simply is not my job anyway,
but should be implemented in the Xbase++ runtime 

> I would assume that
> we could provide the information about the Accept-Language of the
> request in the From function via the Html object.

I don't think browser does sends this information...

> What do you mean with "opens up a bunch of problems". Where do you
> feel that further processing can not be done with 

Apart from the problems woring with unicode described above, converting to
UTF-8 needs additional considerations because of fixed field lengths in DBF
databases. In the worst case, a UTF-8 string can be twice as long as the
actual character count.

IMHO, all the time spent on the various improvements over the last few
years would have been invested much better in a .NET version of Xbase++ -
especially because this version could  have been made compatible with the
Mono project wich would then have meant real cross platform development.

regards
Thomas
Andreas HerdtRe: Webforms, WAA and character sets
on Fri, 30 Oct 2009 15:50:15 +0100
Hi Thomas,

Did I wrote Accept-Language? I meant accept-charset in the
form tag, of course. It was a long day yesterday, sorry for
this confusion 

As a matter of fact the accept-charset in the form function does
not give any guarantee about the charset sent by the browser.
I have just investigated into this and could observe that under
some operating system browser combinations this can produce a
request where the content is url encoded. After a quick research
I have found out that even if you wish to have a latin 1 encoding
the IExplorer might send some Windows encoding.

Thus I feel that you will have to handle various encodings in
your Web Application.

I would assume that there is no way other then analyzing the
Content-Type of the http request and do some transformation
if required. As long as Xbase++ does not support some multibyte
character sets it will be difficult to provide some generic
transformation that is done automatically.

To handle the latin1/latin2 issue correctly for your web
applications I suggest not only to store the string, but also
the encoding of the string.

At the point of time you need to do printing, then it should
be sufficient to use a font with the proper codepage to avoid
garbage.

With my best regards,

   Andreas Herdt
   Alaska Software

--------------------------------------------------------------------

Technical Support:      support@alaska-software.com

News Server:            news.alaska-software.com
Homepage:               http://www.alaska-software.com
WebKnowledgeBase:       http://www.alaska-software.com/kbase.shtm

Fax European Office:    +49 (0) 61 96 - 77 99 99 23
Fax US Office:          +1 (646) 218 1281
--------------------------------------------------------------------