[thelist] Function to strip HTML in ASP

Norman Beresford Norman at pigfe.freeserve.co.uk
Mon Jul 10 07:28:28 CDT 2000


Hi all

I've just been sitting here puzzling about this question and to me that crux
of the matter has to be whether you have < or > in your content, or if they
only appear in that form where they are around tags.  If they appear within
the content as &lt; and &gt; then I think you'll be able to use the code
below to sort them out.  If they don't then I don't see an easy way to sort
them out.

Anyway this bit of code appears to work. It works by assuming that any "<"
characters represent the opening of a tag, and that all ">" characters
represent the closing of a tag.  What we want to do is get rid of anything
after the opening bracket, but keep everything after the closing bracket
(until the next opening bracket).  So we replace the angle brackets and
split the string on them.  Any element begining "-" contains the contents of
a tag and is removed, any element begining "+" contains content and is kept.
Once we've looped through the whole array we can then join it back together
to give us strWebPage sans all HTML.

strWebPage = Replace(strWebPage,"<","x!x-")
strWebPage = Replace(strWebPage,">","x!x+")

arrayWebPage = Split(strWebPage, "x!x")

arrayLength = Ubound(arrayWebPage)

For i = 1 to arrayLength
If Left(arrayWebPage(i),1) = "-" Then
arrayWebPage(i) = ""
Else
stringLength = Len(arrayWebPage(i))
newstringLength = stringLength - 1
arrayWebPage(i) = Right(arrayWebPage(i),newstringLength)
End If
Next

strWebPage = Join(arrayWebPage)


btw - to be honest I'm not quite sure why this code works.  I assumed that
"i" should start at 0, but this throws up errors, and starting it at "1"
seems to get rid of those.

Norman






More information about the thelist mailing list