[thelist] auto closing opened tags

Rick den Haan rick.denhaan at gmail.com
Tue Feb 3 04:29:21 CST 2009


Jeremy Weiss wrote:
> I'm trying to modify a PHP string trimming script so that when it
> trims the string, it doesn't do it in the middle of an HTML tag and so
> that it'll close any HTML tags that are open.
> 
> Has anyone ran across an existing PHP script/function/class that does
> this, by chance?

Hi Jeremy,

Once upon a blue monday I wrote something that did exactly this.
Unfortunately, I no longer have that script, but here's a basic outline of
how I did this (I was blissfully unaware of regular expressions in those
days):

After trimming the string, I started running through it from the beginning,
looking for < characters. I had full control over the source code, so I knew
that regular < text would be encoded as &lt;, and no custom tags were used,
so that this would always find html tags.

I created an array to hold all closing tags in order.

So, the second I encounter a tag, say "<p " or "<p>" (note the space in the
check, to avoid confusing it with "<pre"), I would push a </p> onto the
array. Then, if I next encounter a </p>, I would remove it from the array.

After having run through the entire (trimmed) string, I would dump any
remaining closing tags from the array onto the end of the string. Exceptions
for this were </br>, </hr> and </img>. Checking for those closing tags at
this point was easier than trying to scan for self-closing tags earlier,
because of whatever else might be there, e.g. <br style="clear:both" />.
Thinking back now, that would not have been a big problem, but it worked.

One tip though: always remove scripts and objects completely :-) There's a
whole plethora of things that can go wrong if you leave those (partially)
in, or if you trim your string halfway into a javascript function.

Good luck!

Rick.




More information about the thelist mailing list