[ic] Filters with UTF-8 body

Peter peter at pajamian.dhs.org
Thu Mar 12 10:17:25 UTC 2009


On 03/12/2009 03:04 AM, Stefan Hornburg wrote:
> Hello,
> 
> I noticed the following error with UTF8 content from the database:
> 
> [item-filter 100.][item-field long_description][/item-filter]
> 
> The [item-filter] doesn't get interpolated as long as long_description
> contains UTF-8 characters.
> 
> Peter Ajamian suggested that the following code in Interpolate.pm
> causes the problem:
> 
>  '_filter'               => qr($T{_filter}\s+($Some)\]($Some)),
>  my $Some = '[\000-\377]*?';

More specifically $Some, $All, $XSome and $XAll will only parse 8 bit
characters in the range \000-\377.  Not positive about this, but I think
that changing them to the following will work:
my $All = '(?:(?s).*)';
my $Some = '(?:(?s).*?)';
my $XAll = qr{(?:(?s).*)};
my $XSome = qr{(?:(?s).*?)};


Specifically, the above will match any character with the /s switch
turned on so that the . also matches \n, \r, etc.  Changing to . for the
match allows perl to worry about if the string is UTF8 instead of us
having to specify the correct range.

Peter



More information about the interchange-users mailing list