[ic] Call for testers

Wed Apr 29 16:58:23 UTC 2009

Quoting David Christensen (david at endpoint.com):
> 
> On Apr 29, 2009, at 10:23 AM, Mike Heins wrote:
> 
> > Quoting David Christensen (david at endpoint.com):
> >>
> >> On Apr 29, 2009, at 9:33 AM, Mike Heins wrote:
> >>
> >>> Quoting David Christensen (david at endpoint.com):
> >>>>
> >>>> On Apr 29, 2009, at 8:18 AM, Jon Jensen wrote:
> >>>>
> >>>>> On Wed, 29 Apr 2009, Stefan Hornburg (Racke) wrote:
> >>>>>
> >>>>>> The most recent change bails out with:
> >>>>>>
> >>>>>> Unrecognized/unsupported MV_HTTP_CHARSET: 'utf-8'.
> >>>>>> ulisses config error: Unrecognized/unsupported MV_HTTP_CHARSET:
> >>>>>> 'utf-8'.
> >>>>>>
> >>>>>> Any idea why?
> >>>>>
> >>>>> Wild guess, but have you tried "utf8" instead of "utf-8"? They're
> >>>>> not the
> >>>>> same in Perl. But if you were using "utf-8" before and it  
> >>>>> worked, I
> >>>>> don't
> >>>>> know.
> >>>>
> >>>>
> >>>> No, this definitely is a regression.  I suspect this may be due to
> >>>> the
> >>>> Global::UTF8 variable logic introduced when I merged upstream CVS,
> >>>> but
> >>>> I'll have to hunt it down to be sure.  Either utf-8 or utf8 are
> >>>> acceptable here, one gets resolved to "strict" utf8, but both are
> >>>> valid.  (And any aliasable encoding works here; this message is  
> >>>> what
> >>>> appears when we can't resolve the alias.  (An artifact of require/
> >>>> import vs use, perhaps?)
> >>>
> >>> Yes, I think it is having problems because of the Encode::PERLQQ and
> >>> other quasi-constant subroutines. This is a bit maddening, because  
> >>> it
> >>> appears there is no way to have a conditional namespace and use  
> >>> those
> >>> types of methods.
> >>
> >> Perhaps a string-eval "use Encode" would be in order?
> >
> > I have never done that. If it works and gives us all the namespace
> > equivalents, I am all for it.
> >
> >>
> >>> If Encode didn't pollute regexes, it would be fine. Is there some
> >>> trigger for that, something like the old &sawampersand?
> >>
> >> Can you explain the regexp-polluting behavior?  I'm not sure I
> >> understand.
> >
> > The regex polluting behavior is attaching Encode behavior to /i
> > and other modifiers that might determine case-sensitivity (which you
> > have to UTF8-ify, of course). This is what kills Safe.
> 
> I solved the Safe issue entirely in one of my commits to the ic-utf8  
> repo; basically I created a wrapper class, Vend::Safe and replaced all  
> instances of new Safe in the codebase with new Vend::Safe, and put  
> common initialization behavior in the wrapper class.  So is the Safe  
> *breakage* the issue, or are there performance issues with utf8 that  
> is the concern?

The Safe breakage is the issue for me. As well as any significant 
performance penalty we pay when UTF8 is not being used. I think I have
fixed the latter for the most part.

> 
> > In older 5.4 and earlier versions, there was a behavior associated
> > with the & atom (results of previous search). It set a variable called
> > sawampersand, which would greatly slow down certain types of regex
> > substitutiions.
> 
> Okay, I know what you're talking about here, basically the global  
> penalty for any regex using $&.

If it was only regexes using $&, no one would ever have gotten excited.
That global penalty affected *all* regexes using any backreference,
which was the problem. Just like the current problem affects all regexes
no matter whether UTF8 is being used or not at the time. 

-- 
Mike Heins
Perusion -- Expert Interchange Consulting    http://www.perusion.com/
phone +1.765.328.4479  <mike at perusion.com>

When the only tool you have is a hammer, all your problems tend to look
like nails.  -- Abraham Maslow