[thelist] [the list] Mysql/Xml schema for storing words
Dan CRACIUN
dcsquare at myrealbox.com
Fri Mar 4 04:37:46 CST 2005
Hey list,
Me and my big mouth have done it again! While trying to convince a
publisher that his site could use some refreshments (3 years since the
last redesign) I suggested that a free online translator could attract
way more traffic. The problem is he actually liked the idea :-)
The publishing house has printed several bilingual dictionaries
(english, french, greek, latin, spanish etc) and I have acces to the
excel files that were used to generate those.
The schema is as follows:
term_lang1 field pronunciation definition_lang1 term_lang2
where:
term_lang1 is the word/term in the first language
field is the field of activity where that term is uded (for ex
IT/Medicine/Astronomy etc) (optional: some terms have it, some don't)
pronunciation is the pronunciation with those funy chars from IPA
extensions (optional)
definition_lang1 is the definition of the term in the same language
(optional)
term_lang2 is the translation in the target language
Needless to say, a term can have different meanings in different fields
of activity (for ex gap in electricity and genetics), it can have
different meanings if it has the same form as a verb, noun, adjective,
etc, so term_lang1 is by no means unique.
Neither is term_lang2, cause the same term can be translated by several
synonyms.
I figured I can use Mysql to store the data, allthough it will have
close to 2 million records, so the performance will be an issue. And to
populate the Mysql tables I'll need to first convert the excel files to
xml (allthough I have to dig on that). And I will probably be forced to
use triangulation (I mean for ex. I have the translation from english to
french and from french to german and I will use this to get the term's
translation from english to german)
And now the first few from a (probably) long list of questions:
How good is the Mysql support for Unicode chars?
How would you define the xml schema so that excel actually uses it?
Anyone here worked on a similar project and can share his experience?
I know that i'm probably biting more than I can chew here (I'm pretty
good with php and decent with Mysql, but I've never done something so
complex as this) but I do consider this a challenge, so any help would
be apreciated.
TIA,
Dan CRACIUN
More information about the thelist
mailing list