[ic] Filters with UTF-8 body

Peter peter at pajamian.dhs.org
Thu Mar 12 19:10:04 UTC 2009


On 03/12/2009 06:12 AM, David Christensen wrote:
> On Mar 12, 2009, at 5:31 AM, Peter wrote:
> 
>> On 03/12/2009 03:17 AM, Peter wrote:
>>> On 03/12/2009 03:04 AM, Stefan Hornburg wrote:
>>>> Peter Ajamian suggested that the following code in Interpolate.pm
>>>> causes the problem:
>>>>
>>>> '_filter'               => qr($T{_filter}\s+($Some)\]($Some)),
>>>> my $Some = '[\000-\377]*?';
>>> More specifically $Some, $All, $XSome and $XAll will only parse 8 bit
>>> characters in the range \000-\377.  Not positive about this, but I  
>>> think
>>> that changing them to the following will work:
>>> my $All = '(?:(?s).*)';
>>> my $Some = '(?:(?s).*?)';
>>> my $XAll = qr{(?:(?s).*)};
>>> my $XSome = qr{(?:(?s).*?)};
>>
>> On further reflection this would probably work just as well and is  
>> less
>> complex looking:
>> my $All = '[.\n]*';
>> my $Some = '[.\n]*?';
>> my $XAll = qr{[.\n]*};
>> my $XSome = qr{[.\n]*?};
> 
> 
> Heh, one problem:
> 
> $ perl -e 'print "matches!" if "foo" =~ /[.\n]/'
> $ perl -e 'print "matches!" if "foo" =~ /(.|[\n])/'
> matches!

Strange.  So may as well go with the (?s) solution as Jon says.  To add
a few more tests:
peter at peter-desktop:~$ perl -le 'print $1 if "foo\nbar" =~ /((?:(?s).*))/'
foo
bar
peter at peter-desktop:~$ perl -le 'print $1 if "foo\nbar" =~ /((?:.|\n)*)/'
foo
bar
peter at peter-desktop:~$ perl -le 'print $1 if "foo\nbar" =~ /(.*)/'
foo
peter at peter-desktop:~$


Peter




More information about the interchange-users mailing list