[thelist] Excluding tags from a regular expression search

Ken Snyder kendsnyder at gmail.com
Wed Sep 12 13:59:16 CDT 2007


will garrison wrote:
> ...
> This is working perfectly. The problem however is that when a user
> enters an "a", for example, every "a" is highlighted including any "a"
> that is within a tag.
> ...
I think using a regular expression alone is impossible.  I've 
implemented the same thing in the php5 code below.  Basically, any time 
the subject string contains a '<' it splits the string at html tags, 
replaces the parts that don't contain '<', and rejoins the array.

To use it, simply pass in an array of search terms (e.g. array('darth', 
'vader')), the css class name string, and the subject string.

- Ken


function str_highlight($findArray, $className, $subject)
{
    // order the terms from longest to shortest to minimize the amount 
of highlighting needed
    usort($findArray, 'compareStrLen');
   
    foreach($findArray as $find)
    {       
        // escape preg special characters - \ + * ? [ ^ ] $ ( ) { } = ! 
< > | and delimeter /
        $qfind = preg_quote($find, '/');
        // set up a simple case insensitive find
        $caseIFind = "/($qfind)/i";
        // replace any keywords with the keyword wrapped in a span with 
class className
        $replace = "<span class=\"$className\">\\1</span>";   
           
        // find out if we need to use preg to avoid highlighting text 
within html tags
        if (strpos($subject, '<') !== false)
        {   
            // separate elements into an array with html and non-html 
separated
            $separateHtmlTags = '/(<(?:[^<>]+(?:"[^"]*")?)+>)/';
            $separated = preg_split($separateHtmlTags, $subject, -1, 
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
           
            // replace each array element that does not start with "<"
            foreach ($separated as &$sep)
            {
                if (substr($sep, 0, 1) != '<')
                    $sep = preg_replace($caseIFind, $replace, $sep);   
            }
           
            // store the array back into the subject string
            $subject = join('', $separated);
        }
        else
        {
            $subject = preg_replace($caseIFind, $replace, $subject);   
        }
       
    }
    return $subject;
}

function compareStrLen($a, $b)
{
   if (strlen($a) == strlen($b))
   {
       return 0;
   }
   return (strlen($a) > strlen($b) ? -1 : 1);
}




More information about the thelist mailing list