[thelist] Regex and converting HTML tags to lower case

Simon Willison cs1spw at bath.ac.uk
Sat Nov 8 17:51:22 CST 2003


Bruce MacKay wrote:
> New to regex and predictably I am encountering a problem.  I want to 
> convert all HTML tags to lower case.
> 
> I can identify the tags in regex, but I cannot find any online examples 
> of using that info to convert the tags to lower case.
> 
> 
> function fnLowerCaseHTML(strHTML)
>   Dim objRegExp, strOutput
>   Set objRegExp = New Regexp
>   objRegExp.IgnoreCase = True
>   objRegExp.Global = True
>   objRegExp.Pattern = "<(.|\n)+?>"
>   strOutput = ?????
>   fnLowerCaseHTML = strOutput
>   Set objRegExp = Nothing
> End Function
> 
> Can anyone help me?  Is it possible to do what I'm trying to accomplish 
> with regex?

It should be, although ASP makes my eyes bleed so I couldn't help you 
out with specific code. The regular expression feature you need to 
investigate is the one that allows you to specify a callback function 
which takes a matched group from a regular expression, does something to 
it and returns the result ready to be replaced in to the string.

In PHP, you would use preg_replace_callback for this:

$html = 'This is some <EM>HTML</EM>';
function tags_lower($matches) {
     return strtolower($matches[0]);
}
$html = preg_replace_callback('<.*?>', 'tags_lower', $html);

In Python, the re.sub method supports a similar technique:

 >>> import re
 >>> tagRE = re.compile('<.*?>')
 >>> html = 'This is some <EM>HTML</EM>'
 >>> def taglower(m):
     return m.group(0).lower()
 >>> tagRE.sub(taglower, html)
'This is some <em>HTML</em>'

ASP should provide the same feature somewhere.

Hope this helps,

Simon Willison



More information about the thelist mailing list