[ic] vlink display of UTF8 characters

Peter peter at pajamian.dhs.org
Tue Aug 16 06:12:07 UTC 2016


This landed in my spam box, I only now found it...

On 16/08/16 02:50, David Christensen wrote:
> What charset was the site in originally and what were the other
> settings (Apache’ DefaultCharset, MV_HTTP_CHARSET, MV_UTF8,
> perl/Encode.pm versions, etc)?

The site has been in UTF-8 for years, and we only recently started
having issues with it.

AddDefaultCharset UTF-8

Variable MV_HTTP_CHARSET UTF-8
Variable MV_UTF8         1
DatabaseDefault PG_ENABLE_UTF8 1
DatabaseDefault GDBM_ENABLE_UTF8 1

Database is set to UTF-8 in postgresql.

# /usr/local/perl/bin/perl --version

This is perl 5, version 12, subversion 1 (v5.12.1) built for x86_64-linux

# /usr/local/perl/bin/perl -MEncode -le 'print $Encode::VERSION'
2.39

> I wonder if this was properly configured in the IC side, as I’d just
> expect the vlink to pass-thru the octets regardless of encoding.  In
> any case, this doesn’t feel correct to me, so I’d like to see what
> other information we can gather.

I've literally debugged it right to the point where IC writes the data
out to the socket.  I can log right from the end of
Vend::Server::respond() (which is right the point where Interchange
writes to the socket) and I get perfectly formatted UTF-8 data in the
log.  I also checked $$body with Encode::is_utf8() at that same point
and it comes back true, so it certainly has UTF8 data at that point.

I then went on to manually fetch a page from vlink (by setting the
environment variables to simulate a page fetch) and it spewed out
invalid UTF-8 chars.  I swapped for vlink.pl and still got the invalid
chars.  I added the binmode line and it came back with properly
formatted UTF-8.  Then I checked with apache and my normal browser with
the swapped out vlink.pl and it likes the output as well.

> C-wise, you’d have to write your own equivalent to the PerlIO layer
> to encode input data as UTF8, which is another reason I think this is
> just misconfigured, not fundamentally broken at this layer.  We’ve
> had quite a few sites use the IC UTF-8 layer without ever having to
> resort to vlink modifications.

Yes, I just wonder how many of them are still using vlink.  Haven't most
people moved onto mod_perl2 now?


Peter



More information about the interchange-users mailing list