[ic] Call for testers

Peter peter at pajamian.dhs.org
Fri Mar 13 09:29:35 UTC 2009

On 03/12/2009 10:48 PM, Jon Jensen wrote:
> This is starting to sound awfully fancy.
> If you can make all this autodetection work well, and without a big 
> performance hit, I suppose it's nice.
> But do we really want to encourage people to have various files in various 
> encodings, never really sure what it is? Do we really want it to be 
> possible for Interchange to autodetect that an HTML header file is in 
> Windows-1252 and convert it to UTF-8, yet its header still says 
> Windows-1252?

IC should be setting the encoding in the header according to the

> I doubt that's the only case where this fancy autodetection stuff could 
> bite us.

Hrmmm, you may be right ...

> I'd prefer to see nothing done to the bytes if UTF-8 support is disabled, 

On this I agree.

> and if it's enabled, see any invalid UTF-8 bytes converted to ? 
> characters. That's simple, nonfatal at runtime, and yet gently encourages 
> developers to get their sources in the proper UTF-8 encoding.

I'm fine with that, and that was the original proposal.  One problem,
though, is that while I thought that the Encode module could do that,
apparently it can only barf when decoding unicode input, so we would
have to find another way to find the invalid chars and change them over.

> I think that's still pretty lax, since we're not aborting on invalid 
> characters as e.g. Postgres does. What do you all think?

I'm fine as long as we can find a way to reasonably do that.  From what
I can tell it may actually be harder to do that than it is to implement
a fallback mechanism.

> I think Mike (reasonably) just wants backward compatibility to stop 
> breaking for non-UTF-8 stuff.

Yes, and I fully agree with that.

> I suspect Frederic, Stefan, Kevin, and very 
> few others who are actually using UTF-8 would be better suited to offer 
> opinions on changes in that setup.

I'm happy to hear their input.


More information about the interchange-users mailing list