[thelist] How to Code Multiple Languages?
Dejan Kozina
dejan at kozina.com
Fri Dec 8 10:46:30 CST 2006
Hi Craig.
You were probably searching for a way to 'encode multiple languages',
while what you seem to be looking for is how to encode multiple
character sets. UTF-8 is the definite answer to this.
To start with, you need an editor capable of saving your text as utf-8.
My own choice is Notepad++ (http://notepad-plus.sourceforge.net), but
there are many other open source apps out there, like SciTE or jEdit. I
suppose there should be WYSIWYG editors too, but I can't advice on those
as I never use them. Just input or copy/paste your content and choose
the appropriate charset when saving. You can safely save plain ASCII as
utf-8, too.
One thing you should avoid is having the editor save the file with a
Byte Order Mark (BOM), as this sequence of control characters is useless
on the web and some browsers display it at the start of the page. If
you're unsure if your editor is doing this just search your
work-in-progress folder for files containing "". If any is found,
open in a plain (i.e. not unicode capable) text editor and manually
remove from the beginning of the file, saving the result. Plain text
editors can be used with utf-8 documents as long as you change the ASCII
charactes only and leave alone everything else.
Next thing you should do is tell the browser about the encoding. The
proper way to this is tho have the server send the correct Content-Type
HTTP response headers. If your website runs on Apache and you can set up
per-directory configuration with a .htaccess file, add
"AddDefaultCharset utf-8" to it. If you can't, the second best is to add
a meta element to the head section of your documents: "<meta
http-equiv="content-type" content="text/html; charset=utf-8">". You
should do this even if you use HTTP headers, so your pages will be
displayed correctly even when saved on disk.
The last hitch is hoping the browser has a font capable of displaying
the unicode characters you're sending to it. You should avoid fancy
font-family declarations and stay on the generic side, letting the
browser choose, as this isn't something you can control. Since W2K and
OSX browsers are capable of displaying practically everything you can
throw at them, unless you force them to a specific font which may or may
not contain the gliphs (shapes) required for non-ASCII content.
CSS files also accept a charset declaration: just put '@charset
"utf-8";' as the first line of your css file. This has a sense only if
you use css-generated content, but won't break anything if you use it
anyway.
The script element also accepts a charset attribute, as in <script
charset="utf-8" ... ></script>. Your mileage may.vary.
Done that you can set up the proper mark-up related to the document
language (mind the difference between a language and a character set: a
language can use more than one character set and many languages can
share the same character set). If your document has a primary (main)
language define the lang attribute of html (e.g. <html lang="en">) and
declare separately the lang attibute for every block of content in a
different language (set lang="whatever" on divs, span and other
containers, it will be inherited down the DOM tree). If you cannot or
want not set a language as primary, leave lang out of the html element
and mark up with it all the content.
There is a 'Content-Language' http header too. Use it to list the
languages used in the document. There is no .htaccess shortcut to this,
you shoud use mod_headers syntax (if your server has it installed) and
add it to the head too ('<meta http-equiv="content-language"
content="en,jp">'). Multiple comma separated languages are OK.
As per the accessibility specs, you should further use a hreflang
attribute with links to a page in a different language ( '<a href="..."
hreflang="fr">'). I'n not at all sure if screen readers do something
meaningful with it, anyway...
If said links are used with a client-side image map take note that the
area element does not accept a hreflang attribute. Use plain anchor
links ('<a>') instead, as they can have shape and coords attributes.
Well, that more or less it, unless you have to use server-side
scripting, where you have to set up correctly your database tables and
such...
Hope this helps.
djn
Craig Givens wrote:
> I've been researching encoding and charsets like Unicode, UTF-8 and
> big5 and there just doesn't seem to be a way to do this.
--
Dejan Kozina
Dolina 346 (TS) - I-34018 Italy
tel./fax: +39 040 228 436 - cell.: +39 348 7355 225
http://www.kozina.com/ - e-mail: dejan at kozina.com
More information about the thelist
mailing list