[ic] Filters with UTF-8 body

Jon Jensen jon at endpoint.com
Thu Mar 12 16:35:40 UTC 2009


On Thu, 12 Mar 2009, Peter wrote:

>>> Peter Ajamian suggested that the following code in Interpolate.pm
>>> causes the problem:
>>>
>>>  '_filter'               => qr($T{_filter}\s+($Some)\]($Some)),
>>>  my $Some = '[\000-\377]*?';
>>
>> More specifically $Some, $All, $XSome and $XAll will only parse 8 bit
>> characters in the range \000-\377.  Not positive about this, but I think
>> that changing them to the following will work:
>> my $All = '(?:(?s).*)';
>> my $Some = '(?:(?s).*?)';
>> my $XAll = qr{(?:(?s).*)};
>> my $XSome = qr{(?:(?s).*?)};
>
> On further reflection this would probably work just as well and is less
> complex looking:
> my $All = '[.\n]*';
> my $Some = '[.\n]*?';
> my $XAll = qr{[.\n]*};
> my $XSome = qr{[.\n]*?};

It sure seems safer to use (?s). instead of a character class. I wouldn't 
worry about readability here -- the whole reason these variables exist is 
to abstract out some stuff that is painful to read. :) But I think the 
(?s). form is easier to read anyway.

Jon

-- 
Jon Jensen
End Point Corporation
http://www.endpoint.com/



More information about the interchange-users mailing list