[thelist] Excluding tags from a regular expression search
Ken Snyder
kendsnyder at gmail.com
Wed Sep 12 13:59:16 CDT 2007
will garrison wrote:
> ...
> This is working perfectly. The problem however is that when a user
> enters an "a", for example, every "a" is highlighted including any "a"
> that is within a tag.
> ...
I think using a regular expression alone is impossible. I've
implemented the same thing in the php5 code below. Basically, any time
the subject string contains a '<' it splits the string at html tags,
replaces the parts that don't contain '<', and rejoins the array.
To use it, simply pass in an array of search terms (e.g. array('darth',
'vader')), the css class name string, and the subject string.
- Ken
function str_highlight($findArray, $className, $subject)
{
// order the terms from longest to shortest to minimize the amount
of highlighting needed
usort($findArray, 'compareStrLen');
foreach($findArray as $find)
{
// escape preg special characters - \ + * ? [ ^ ] $ ( ) { } = !
< > | and delimeter /
$qfind = preg_quote($find, '/');
// set up a simple case insensitive find
$caseIFind = "/($qfind)/i";
// replace any keywords with the keyword wrapped in a span with
class className
$replace = "<span class=\"$className\">\\1</span>";
// find out if we need to use preg to avoid highlighting text
within html tags
if (strpos($subject, '<') !== false)
{
// separate elements into an array with html and non-html
separated
$separateHtmlTags = '/(<(?:[^<>]+(?:"[^"]*")?)+>)/';
$separated = preg_split($separateHtmlTags, $subject, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
// replace each array element that does not start with "<"
foreach ($separated as &$sep)
{
if (substr($sep, 0, 1) != '<')
$sep = preg_replace($caseIFind, $replace, $sep);
}
// store the array back into the subject string
$subject = join('', $separated);
}
else
{
$subject = preg_replace($caseIFind, $replace, $subject);
}
}
return $subject;
}
function compareStrLen($a, $b)
{
if (strlen($a) == strlen($b))
{
return 0;
}
return (strlen($a) > strlen($b) ? -1 : 1);
}
More information about the thelist
mailing list