[thelist] Invite critique of XSS prevention function

Brooking, John John.Brooking at sappi.com
Fri Mar 24 23:20:57 CST 2006


Hello, list'ers,

   Sending some code at you for review, if you are interested. This is my latest attempt at a generic and elegant function to clean up text with possible simple HTML such that XSS is prevented. Criticisms, questions, and comments welcome. And if you like it, you are welcome to use it. (Although I realize that sounds a little suspicious given the context!)
   
- John
   
<?php

   function cleanHTML( $text ) {
   /*------

      PURPOSE: Clean up text with potential HTML for display

      RETURNS: The input with any malicious HTML neutralized

      This static function removes potentially harmful HTML from a string to
      prevent cross-scripting (XSS) attacks. It works by allowing a certain
      subset of HTML given by an array inside the function, basically display-
      type tags plus links. It does this by first converting all tag
      characters, plus certain questionable puncuation, into HTML entities.
      Then it converts back just the allowable tags.

   */

      $disallow_punct = array( '&' => '&'
                             , '<' => '&lt;'
                             , '>' => '&gt;'
                             , '(' => '('   // these + & recommended by
                             , ')' => ')'   // http://www.cgisecurity.com
                             , '#' => '#'   //    /articles/xss-faq.shtml
                             );                 // Not sure why, honestly.
      $allow_tags = array( 'b', 'br', 'i', 'u', 'a', 'font', 'p', 'span' );

      foreach( $disallow_punct as $bad => $okay ) {
         $text = str_replace( $bad, $okay, $text );
      }

      foreach( $allow_tags as $tag ) {

         // First, bring back the closing tag (or combined), being easy
         $text = str_replace( "&lt;/$tag&gt;", "</$tag>", $text ); // See?
         $text = str_replace( "&lt;$tag/&gt;", "<$tag/>", $text );

         // Now take care of attributeless opening tags, just as easy.
         $text = str_replace( "&lt;$tag&gt;", "<$tag>", $text );

         // Now get back the opening tags with attributes, where the end of the
         // tag may be an arbitrary distance from the start.
         $start = 0;
         while(( $start = strpos( $text, "&lt;$tag ", $start )) !== FALSE ) {
            if(( $end = strpos( $text, "&gt;", $start )) !== FALSE ) {
               $replace = '<' . substr( $text, $start + 4, $end - ( $start + 4 )) . '>';
               $text = substr_replace( $text, $replace, $start, $end - $start + 4 );
            }
         }

      } // for each allowable tag

      return $text;
   } // function cleanHTML

   // - - - - - - -
   // TESTS
   // - - - - - - -
   
   $examples = array( 'Hello, world!'
                    , '<b>Hello</b>, you darn <b>world!</b>'
                    , 'Hello, <a href="http://www.mainebrook.com/scratch/showVars.php?foo=bar">world</a>!'
                    , 'Hello, <script>bad malicious code inside here!</script>'
                    , "Hello, <font color='blue'><i><b>world!</b></i></font>"
                    , "<p class='error'>You have a big problem!<br/>It's <span class='here'>here</span>!</p>"
                    , '<object name="foo">&nbsp;</object>'
                    , '<ul><li>Lists</li><li>not</li><li>supported!</li></ul>'
                    );
   foreach( $examples as $example ) {
      print '<p>' . cleanHTML( $example ) . "</p>\n";
   }

   /*
   
      Expected output in source (w/o spacing lines, addded for here legibility):
      
      <p>Hello, world!</p>

      <p><b>Hello</b>, you darn <b>world!</b></p>

      <p>Hello, <a href="http://www.mainebrook.com/scratch/showVars.php?foo=bar">world</a>!</p>

      <p>Hello, &lt;script&gt;bad malicious code inside here!&lt;/script&gt;</p>

      <p>Hello, <font color='blue'><i><b>world!</b></i></font></p>

      <p><p class='error'>You have a big problem!<br/>It's <span class='here'>here</span>!</p></p>

      <p>&lt;object name="foo"&gt;&#38;nbsp;&lt;/object&gt;</p>

      <p>&lt;ul&gt;&lt;li&gt;Lists&lt;/li&gt;&lt;li&gt;not&lt;/li&gt;&lt;li&gt;supported!&lt;/li&gt;&lt;/ul&gt;</p>
      
   */
   
?>

-- 

This message may contain information which is private, privileged or confidential and is intended solely for the use of the individual or entity named in the message. If you are not the intended recipient of this message, please notify the sender thereof and destroy / delete the message. Neither the sender nor Sappi Limited (including its subsidiaries and associated companies) shall incur any liability resulting directly or indirectly from accessing any of the attached files which may contain a virus or the like.




More information about the thelist mailing list