[thelist] Invite critique of XSS prevention function
Brooking, John
John.Brooking at sappi.com
Fri Mar 24 23:20:57 CST 2006
Hello, list'ers,
Sending some code at you for review, if you are interested. This is my latest attempt at a generic and elegant function to clean up text with possible simple HTML such that XSS is prevented. Criticisms, questions, and comments welcome. And if you like it, you are welcome to use it. (Although I realize that sounds a little suspicious given the context!)
- John
<?php
function cleanHTML( $text ) {
/*------
PURPOSE: Clean up text with potential HTML for display
RETURNS: The input with any malicious HTML neutralized
This static function removes potentially harmful HTML from a string to
prevent cross-scripting (XSS) attacks. It works by allowing a certain
subset of HTML given by an array inside the function, basically display-
type tags plus links. It does this by first converting all tag
characters, plus certain questionable puncuation, into HTML entities.
Then it converts back just the allowable tags.
*/
$disallow_punct = array( '&' => '&'
, '<' => '<'
, '>' => '>'
, '(' => '(' // these + & recommended by
, ')' => ')' // http://www.cgisecurity.com
, '#' => '#' // /articles/xss-faq.shtml
); // Not sure why, honestly.
$allow_tags = array( 'b', 'br', 'i', 'u', 'a', 'font', 'p', 'span' );
foreach( $disallow_punct as $bad => $okay ) {
$text = str_replace( $bad, $okay, $text );
}
foreach( $allow_tags as $tag ) {
// First, bring back the closing tag (or combined), being easy
$text = str_replace( "</$tag>", "</$tag>", $text ); // See?
$text = str_replace( "<$tag/>", "<$tag/>", $text );
// Now take care of attributeless opening tags, just as easy.
$text = str_replace( "<$tag>", "<$tag>", $text );
// Now get back the opening tags with attributes, where the end of the
// tag may be an arbitrary distance from the start.
$start = 0;
while(( $start = strpos( $text, "<$tag ", $start )) !== FALSE ) {
if(( $end = strpos( $text, ">", $start )) !== FALSE ) {
$replace = '<' . substr( $text, $start + 4, $end - ( $start + 4 )) . '>';
$text = substr_replace( $text, $replace, $start, $end - $start + 4 );
}
}
} // for each allowable tag
return $text;
} // function cleanHTML
// - - - - - - -
// TESTS
// - - - - - - -
$examples = array( 'Hello, world!'
, '<b>Hello</b>, you darn <b>world!</b>'
, 'Hello, <a href="http://www.mainebrook.com/scratch/showVars.php?foo=bar">world</a>!'
, 'Hello, <script>bad malicious code inside here!</script>'
, "Hello, <font color='blue'><i><b>world!</b></i></font>"
, "<p class='error'>You have a big problem!<br/>It's <span class='here'>here</span>!</p>"
, '<object name="foo"> </object>'
, '<ul><li>Lists</li><li>not</li><li>supported!</li></ul>'
);
foreach( $examples as $example ) {
print '<p>' . cleanHTML( $example ) . "</p>\n";
}
/*
Expected output in source (w/o spacing lines, addded for here legibility):
<p>Hello, world!</p>
<p><b>Hello</b>, you darn <b>world!</b></p>
<p>Hello, <a href="http://www.mainebrook.com/scratch/showVars.php?foo=bar">world</a>!</p>
<p>Hello, <script>bad malicious code inside here!</script></p>
<p>Hello, <font color='blue'><i><b>world!</b></i></font></p>
<p><p class='error'>You have a big problem!<br/>It's <span class='here'>here</span>!</p></p>
<p><object name="foo">&nbsp;</object></p>
<p><ul><li>Lists</li><li>not</li><li>supported!</li></ul></p>
*/
?>
--
This message may contain information which is private, privileged or confidential and is intended solely for the use of the individual or entity named in the message. If you are not the intended recipient of this message, please notify the sender thereof and destroy / delete the message. Neither the sender nor Sappi Limited (including its subsidiaries and associated companies) shall incur any liability resulting directly or indirectly from accessing any of the attached files which may contain a virus or the like.
More information about the thelist
mailing list