[ic] removing new line characters

Grant listbox at email.com
Sat Sep 27 11:51:12 EDT 2003


> Paul Jordan [paul at gishnetwork.com] wrote:
> > > I need to strip out the new line characters from a chunk of HTML.  Can
> > > anyone hook me up with a little Perl for that?
> > >
> > > - Grant
> > You can make a filter:
> >
> > CodeDef nonl Filter
> > CodeDef nonl Routine <<EOR
> > sub {
> >    my $val = shift;
> >    $val =~ s/\n//g;
> >    $val;
> > }
> > EOR
> >
> > [filter op="nonl"] junk with newlines [/filter]
> >
> That may produce unexpected results.  Take the following block of
> text as an example:
>
> <p>
> This is
> a
> test
> </p>
>
> The above filter will make "<p>This isatest</p>", which may or may
> not be what's wanted, depending upon your requirements.
>
> I suggest the following, which will replace spans of CR and LF
> characters with a single space:
>
>     sub {
>         my $val = shift;
>         $val =~ s/[\r\n]+/ /g;
>         $val;
>     }
>
> You may want to further enhance that by replacing multiple spaces
> with a single space, so you'd end up with this:
>
>     sub {
>         my $val = shift;
>         $val =~ s/[\r\n]+/ /g;
>         $val =~ s/ {2,}/ /g;
>         $val;
>     }
>
> Of course, that can be simplified to the following, if you don't mind
> converting tabs into single-spaces:
>
>     sub {
>         my $val = shift;
>         $val =~ s/\s+/ /g;
>         $val;
>     }

Great info, and that will definitely come in handy.  Thanks Kevin!

- Grant



More information about the interchange-users mailing list