[ic] ic-utf8 readfile/writefile patch

David Christensen david at endpoint.com
Mon Mar 16 06:06:05 UTC 2009

On Mar 15, 2009, at 11:18 PM, Mike Heins wrote:

> Quoting David Christensen (david at endpoint.com):
>> Folks,
>> I've added a patch to the ic-utf8 tree to support encoding/fallback
>> strategy in Vend::File::readfile and writefile.  This is intended to
>> be completely backwards-compatible with both legacy encodings and the
>> current MV_UTF8 scheme while offering the following benefits:
>>  - Explicit override of the encoding of any specific file.  This
>> defaults to nothing (aka raw) when MV_UTF8 is not set, and utf-8 when
>> MV_UTF8 is set.
>>  - Sensible default fallback to provide maximum information in the
>> case that invalid encoding/decoding sequences are encountered.
>> (Fallback strategy is how we deal with invalid/incomplete  
>> characters.)
>>  - Think future modifications to [include] to provide access to
>> encoding and fallback parameters:  [include file="foo/bar/baz"
>> encoding="cp1252"]
>> I'd appreciate testing of this patch; in particular, this should help
>> with Racke's issue encountered with legacy encodings on the index  
>> page
>> with MV_UTF8 set.
> Has anyone thought of performance? Can this be disabled for people who
> don't want to spend processor power on UTF8?

This should have no discernible performance impact for the legacy  
mode; it's just a few additional if checks.  I know you'd had some  
performance concerns before; were there some test cases you'd been  
able to isolate where you were finding specific issues?  I'd be glad  
to work to make any of these changes as low-impact as possible.


David Christensen
End Point Corporation
david at endpoint.com

More information about the interchange-users mailing list