[thelist] php - htmlentities & charencoding

Lee kowalkowski lee.kowalkowski at googlemail.com
Fri Jul 7 04:21:28 CDT 2006


On 07/07/06, Taras D <taras.di at gmail.com> wrote:
> 1) When should htmlspecial characters be used?

When not using them causes problems!

> As a general rule should
> it be used for text that may contain special characters that is going
> to be rendered in the browser (ie: text that isn't in tags)?

It's recommended to escape ampersands everywhere in HTML documents to
avoid confusion with the beginning of character references, this
includes text in attribute values too.

> I don't know if I should escape the
> ampersand, or even if its possible (seeing that the text is inside a
> HTML attribute).

Just try it.

> Why would you ever use htmlentities as opposed to htmlspecialchars?

I'm assuming htmlentites = named character entity references and
htmlspecialchars = numeric character references.

Character entity references easier to author.  I find it much easier
to use &pound; than to look up the numeric equivalent.

Also, character entity references refer to characters independently of
character set position, whereas numeric character references
explicitly specify character set position.

The numeric equivalent of &pound; isn't guaranteed to render as a
pound sign if the character set I specify isn't available on the
client machine.  There's a greater chance of &pound; rendering
correctly.  Characters typically in the standard 7-bit ASCII range are
pretty much guaranteed no matter whether you use the entity or the
numeric.

> 2) A comment in the PHP manual entry for htmlentities states that their
> function can be used to 'replace any characters in a string that could
> be 'dangerous' to put in an HTML/XML file with their numeric entities
> (e.g. &#233 for [e acute])'. Why would it be dangerous!?

I think the punctuation is perhaps misleading; it is perhaps better
with a comma after "file".  Therefore the function replaces characters
with their numeric entities, but not all characters - only those that
are considered "dangerous".

They'd be dangerous if left unescaped and subsequently used in a XSS attack.

> 3) What are some typical uses of specifying HTTP input/output character
> encoding? If it is used to convert output, why wouldn't you just change
> the output page's char encoding? If its used to convert input from say
> UTF-8 to Latin1, couldn't you just use a function to do this?

I think it's a matter of escaping, rather than converting.  E.g. You'd
use &lt; &gt; instead of < > in your HTML content when you didn't want
them to be parsed as tag delimiters.  That's why character references
exist.

To avoid HTML escaping in your JavaScript, put all your JavaScript in
external files.

-- 
LK



More information about the thelist mailing list