[thelist] Regex and converting HTML tags to lower case
Simon Willison
cs1spw at bath.ac.uk
Sat Nov 8 17:51:22 CST 2003
Bruce MacKay wrote:
> New to regex and predictably I am encountering a problem. I want to
> convert all HTML tags to lower case.
>
> I can identify the tags in regex, but I cannot find any online examples
> of using that info to convert the tags to lower case.
>
>
> function fnLowerCaseHTML(strHTML)
> Dim objRegExp, strOutput
> Set objRegExp = New Regexp
> objRegExp.IgnoreCase = True
> objRegExp.Global = True
> objRegExp.Pattern = "<(.|\n)+?>"
> strOutput = ?????
> fnLowerCaseHTML = strOutput
> Set objRegExp = Nothing
> End Function
>
> Can anyone help me? Is it possible to do what I'm trying to accomplish
> with regex?
It should be, although ASP makes my eyes bleed so I couldn't help you
out with specific code. The regular expression feature you need to
investigate is the one that allows you to specify a callback function
which takes a matched group from a regular expression, does something to
it and returns the result ready to be replaced in to the string.
In PHP, you would use preg_replace_callback for this:
$html = 'This is some <EM>HTML</EM>';
function tags_lower($matches) {
return strtolower($matches[0]);
}
$html = preg_replace_callback('<.*?>', 'tags_lower', $html);
In Python, the re.sub method supports a similar technique:
>>> import re
>>> tagRE = re.compile('<.*?>')
>>> html = 'This is some <EM>HTML</EM>'
>>> def taglower(m):
return m.group(0).lower()
>>> tagRE.sub(taglower, html)
'This is some <em>HTML</em>'
ASP should provide the same feature somewhere.
Hope this helps,
Simon Willison
More information about the thelist
mailing list