[ic] removing new line characters

Kevin Walsh kevin at cursor.biz
Fri Sep 26 22:22:28 EDT 2003


Paul Jordan [paul at gishnetwork.com] wrote:
> > I need to strip out the new line characters from a chunk of HTML.  Can
> > anyone hook me up with a little Perl for that?
> > 
> > - Grant
> You can make a filter:
> 
> CodeDef nonl Filter
> CodeDef nonl Routine <<EOR
> sub {
>    my $val = shift;
>    $val =~ s/\n//g;
>    $val;
> }
> EOR
> 
> [filter op="nonl"] junk with newlines [/filter]
> 
That may produce unexpected results.  Take the following block of
text as an example:

<p>
This is
a
test
</p>

The above filter will make "<p>This isatest</p>", which may or may
not be what's wanted, depending upon your requirements.

I suggest the following, which will replace spans of CR and LF
characters with a single space:

    sub {
        my $val = shift;
        $val =~ s/[\r\n]+/ /g;
        $val;
    }

You may want to further enhance that by replacing multiple spaces
with a single space, so you'd end up with this:

    sub {
        my $val = shift;
        $val =~ s/[\r\n]+/ /g;
        $val =~ s/ {2,}/ /g;
        $val;
    }

Of course, that can be simplified to the following, if you don't mind
converting tabs into single-spaces:

    sub {
        my $val = shift;
        $val =~ s/\s+/ /g;
        $val;
    }

-- 
   _/   _/  _/_/_/_/  _/    _/  _/_/_/  _/    _/
  _/_/_/   _/_/      _/    _/    _/    _/_/  _/   K e v i n   W a l s h
 _/ _/    _/          _/ _/     _/    _/  _/_/    kevin at cursor.biz
_/   _/  _/_/_/_/      _/    _/_/_/  _/    _/



More information about the interchange-users mailing list