[thelist] Detecting HTML tags with Regular Expressions.

Sam-I-Am sam at sam-i-am.com
Tue May 8 09:28:12 CDT 2001


> seems to me this'll match: '<(\w+)>' and '<\/(\w+)>'
> but it won't ensure they are valid html tags which you seem to require.  The
> only way for that is to actually compare against every possible html tag.

this will miss tags like <img src="some_image.gif">. Although it's not
as efficient, I usually use:
	/<(\/*[^>]+>/

(< followed by anything except >, followed by >.)

though if you take a look at
http://msdn.microsoft.com/scripting/jscript/doc/jsgrpregexpsyntax.htm

they suggest
	/<(.*)>.*<\/\1>/ to match a opening and closing HTML tag

I tend to steer clear of .* as it doesn't (normally) match on newlines, 
and occassionally you see
<img
	src="this.gif"
	name="that"
	onmousover="something()"
	alt="alt">

which is perfectly valid. In perl you can alter this behaviour so .*
would work, I don't know a way in javascript. 

> > I'm trying to do a
> > form validation
> > in which it will match any user's input, which can include
> > html tags, and
> > compare it with another list of valid html tags they can use.

needless to say you should also validate on the server-side too. 

hth

Sam




More information about the thelist mailing list