[ic] Call for testers

David Christensen david at endpoint.com
Thu Mar 12 12:32:51 UTC 2009


<snip>

> One thing which also annoys me is the internal server error caused by
> non UTF-8 characters:
>
> 127.0.1.1 ZobI6Yf4:127.0.1.1 - [12/March/2009:09:24:20 +0100] ulisses
> /cgi-bin/ic/ulisses/index Runtime error: Malformed UTF-8 character  
> (fatal)
> at /usr/lib/interchange/Vend/Parser.pm line 112.

What is the text on the index page?  I'm assuming this was in some  
legacy encoding and that MV_UTF8 was set to 1.  If MV_UTF8 is off,  
this is a bug that should be addressed, as breaking legacy encodings  
when MV_UTF8 is off is a Bad Thing.  One of the consequences of  
setting MV_UTF8 is that it expects all of your pages, etc to be in the  
utf-8 encoding.  There are a couple of options here, none of which are  
really automatic, because there's no real way to tell what encoding  
the pages were in without being told (or making some educated guesses  
based on locale, assuming latin-1, etc).

I've thought of writing a helper script to analyze a catalog's files  
and convert them to utf-8, or at least write out a parallel catalog  
for the purposes of testing. We could prompt the user for encoding or  
locale/use file(1) to make a first guess at encoding, and inconv(1) or  
Encode(3pm) to make the actual conversion to UTF-8.  It's kind of  
sticky with upgrades, as we'd need to also ensure that any variables  
get converted and upgrade the contents of the database to also be in  
UTF-8.  Perhaps an automated conversion is a bad idea, but something  
which could perform the analysis and report back would be useful.

Regards,

David
--
David Christensen
End Point Corporation
david at endpoint.com
212-929-6923
http://www.endpoint.com/






More information about the interchange-users mailing list