[thelist] [the list] Mysql/Xml schema for storing words

Dan CRACIUN dcsquare at myrealbox.com
Fri Mar 4 04:37:46 CST 2005


Hey list,

Me and my big mouth have done it again! While trying to convince a 
publisher that his site could use some refreshments (3 years since the 
last redesign) I suggested that a free online translator could attract 
way more traffic. The problem is he actually liked the idea :-)
The publishing house has printed several bilingual dictionaries 
(english, french, greek, latin, spanish etc) and I have acces to the 
excel files that were used to generate those.
The schema is as follows:
term_lang1   field   pronunciation    definition_lang1   term_lang2

where:
term_lang1 is the word/term in the first language
field is the field of activity where that term is uded (for ex 
IT/Medicine/Astronomy etc) (optional: some terms have it, some don't)
pronunciation is the pronunciation with those funy chars from IPA 
extensions (optional)
definition_lang1 is the definition of the term in the same language 
(optional)
term_lang2 is the translation in the target language

Needless to say, a term can have different meanings in different fields 
of activity (for ex gap in electricity and genetics), it can have 
different meanings if it has the same form as a verb, noun, adjective, 
etc, so term_lang1 is by no means unique.
Neither is term_lang2, cause the same term can be translated by several 
synonyms.

I figured I can use Mysql to store the data, allthough it will have 
close to 2 million records, so the performance will be an issue. And to 
populate the Mysql tables I'll need to first convert the excel files to 
xml (allthough I have to dig on that). And I will probably be forced to 
use triangulation (I mean for ex. I have the translation from english to 
french and from french to german and I will use this to get the term's 
translation from english to german)

And now the first few from a (probably) long list of questions:
How good is the Mysql support for Unicode chars?
How would you define the xml schema so that excel actually uses it?
Anyone here worked on a similar project and can share his experience?

I know that i'm probably biting more than I can chew here (I'm pretty 
good with php and decent with Mysql, but I've never done something so 
complex as this) but I do consider this a challenge, so any help would 
be apreciated.

TIA,
Dan CRACIUN



More information about the thelist mailing list