[thelist] charset, multipart/form-data and multipart/x-www-form-urlencoded

Bill Moseley moseley at hank.org
Sat Jan 23 11:40:11 CST 2010


I have a working application that is all utf-8.  On my web forms I either
use this:

<form method="post" action="..." accept-charset="utf-8">

Or if I have any upload fields on the form I use:

<form method="post" action="..." enctype="multipart/form-data"
accept-charset="utf-8">

Note that the accept-charset is requesting the client to encode in utf8.

With the two browsers I tested (Firefox 3 and Crome) the respective content
type headers in the POST are:

Content-Type: application/x-www-form-urlencoded
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7


Content-Type: multipart/form-data;
boundary=---------------------------124668924421214781111253174633
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

(I realize the Accept-Charset header is not relative in my question below).

I'm curious why the browser is not telling me what character encoding it
used.  Do I just have to assume that the character encoding is what I
specified in the accept-charset in the <form> element?  Obviously, clients
don't have to read my form before posting.  I do decode all content as utf-8
(and thus an error will be generated if invalid utf8 is detected).

Just seems odd.  When sending a series of octets that represent text to some
remote server sure seems like the client would need to specify the character
encoding used to encode those octets.

Am I missing some fundamental part of http?


-- 
Bill Moseley
moseley at hank.org


More information about the thelist mailing list