[thelist] Regex

Andrew Moore amoore at mooresystems.com
Thu Feb 28 16:32:00 CST 2002


for starters it won't match this by default:

<img
src="arrow.gif">

(beware the intentional line wrapping.)

-Andy


On Fri, Mar 01, 2002 at 09:21:07AM +1100, Lindsay Evans wrote:
>
> > It's worth noting that the Perl Cookbook (Recipe 20.6) cites the regexp
> > below as invalid for all but the most simple HTML.  If you're using Perl,
> > try using a package like HTML::Parser.  Otherwise, you're going to have a
> > very hard time constructing a regexp that does this.
>
> Interesting.
>
> Does it happen to metion any specific cases where it doesn't work?
> I've used this quite a bit on some rather complex html (including xhtml
> tags, etc.), and it worked fine.
>
> Though I'd imagine if you had invalid html to start with (ie. <img
> src="arrow.gif" ... alt=" -> ">) that it would break.
>




More information about the thelist mailing list