[ic] vlink display of UTF8 characters
peter at pajamian.dhs.org
Tue Aug 16 06:12:07 UTC 2016
This landed in my spam box, I only now found it...
On 16/08/16 02:50, David Christensen wrote:
> What charset was the site in originally and what were the other
> settings (Apache’ DefaultCharset, MV_HTTP_CHARSET, MV_UTF8,
> perl/Encode.pm versions, etc)?
The site has been in UTF-8 for years, and we only recently started
having issues with it.
Variable MV_HTTP_CHARSET UTF-8
Variable MV_UTF8 1
DatabaseDefault PG_ENABLE_UTF8 1
DatabaseDefault GDBM_ENABLE_UTF8 1
Database is set to UTF-8 in postgresql.
# /usr/local/perl/bin/perl --version
This is perl 5, version 12, subversion 1 (v5.12.1) built for x86_64-linux
# /usr/local/perl/bin/perl -MEncode -le 'print $Encode::VERSION'
> I wonder if this was properly configured in the IC side, as I’d just
> expect the vlink to pass-thru the octets regardless of encoding. In
> any case, this doesn’t feel correct to me, so I’d like to see what
> other information we can gather.
I've literally debugged it right to the point where IC writes the data
out to the socket. I can log right from the end of
Vend::Server::respond() (which is right the point where Interchange
writes to the socket) and I get perfectly formatted UTF-8 data in the
log. I also checked $$body with Encode::is_utf8() at that same point
and it comes back true, so it certainly has UTF8 data at that point.
I then went on to manually fetch a page from vlink (by setting the
environment variables to simulate a page fetch) and it spewed out
invalid UTF-8 chars. I swapped for vlink.pl and still got the invalid
chars. I added the binmode line and it came back with properly
formatted UTF-8. Then I checked with apache and my normal browser with
the swapped out vlink.pl and it likes the output as well.
> C-wise, you’d have to write your own equivalent to the PerlIO layer
> to encode input data as UTF8, which is another reason I think this is
> just misconfigured, not fundamentally broken at this layer. We’ve
> had quite a few sites use the IC UTF-8 layer without ever having to
> resort to vlink modifications.
Yes, I just wonder how many of them are still using vlink. Haven't most
people moved onto mod_perl2 now?
More information about the interchange-users