[ic] vlink display of UTF8 characters

Peter peter at pajamian.dhs.org
Tue Aug 16 08:34:24 UTC 2016


On 16/08/16 10:28, Peter wrote:
> Further testing shows that this breaks binary file uploads.  I am
> testing how it works to move the binmode line to after get_entity().
> 
> I think that reading the HTTP headers for the charset and setting the
> binmode accordingly may be the ultimate solution here to make sure that
> it doesn't clash with file downloads either.

As it turns out, I just didn't move the binmode line down far enough.
That said, I came up with this which seems to work well for me:

--- src/vlink.pl	2010-08-20 17:29:56.000000000 -0700
+++ bin/vlink.pl	2016-08-16 01:22:20.000000000 -0700
@@ -162,9 +162,14 @@
 print SOCK send_entity();
 print SOCK "end\n";

-
+my $utf8;
 while(<SOCK>) {
-	print;
+    $utf8 ||= /^Content-Type: .+; charset=UTF-?8\s*$/i;
+    print;
+    if (!/\S/) {
+	binmode(SOCK, ':utf8') if $utf8 > 0;
+	$utf8 = -1;
+    }
 }

 close (SOCK)								or die "close: $!\n";


... so basically put it only sets binmode to utf8 for the data, not for
the HTTP headers and only if it sees a charset explicitly set to utf8 in
the headers.

I'm definitely not going to commit this without a fair bit of further
review, and I still would like to see patches for vlink.c and tlink.c.
I have no idea how those would handle it.


Peter



More information about the interchange-users mailing list