[thelist] redesigning a huge database

Lauri Väin lauri_lists at tharapita.com
Thu May 11 02:18:39 CDT 2006


Hi,

As far as internationalization goes, there are two basic approaches:
 - full templates
 - language strings with replacement markers in them

Translation companies tend to mess up templates with markup in it to the 
point of sometimes even translating the markup contents. You'll be suprised 
what you will experience in the coming months/years. You can extract the 
strings for translation from the markup or keep them separate from day one. 
Or you can manage the full templates. To manage language strings, use 
something like RBmanager or something that somebody else mentioned in this 
thread already.

Optimizing the databases is a science in itself. It all depends on the kind 
of a database you have - whether you write to it, update to it, select 
something little on it by index or do frequent sequence scans with batch 
jobs or something else.

When it's mostly selecting and adding an odd row every once a while, then 
your requirements are not that bad. 700k rows and expanding to other 
European countries is not a concern for most applications if you handle your 
data properly. Just make sure your database looks good, you select by 
indexes, keep your data sane and know what you are doing. Redundancy, uptime 
requirements other requirements may make your system more complex, of 
course. If you do lots of sequence scans you may need separate replica 
servers for that and if you do lots of updates, then locking can get nasty 
for some applications. The whole thing really comes with research... and 
with experience most of all.

If the European expansion program is likely to have a VERY significant 
takeup rate or you have a very special kind of an application, then you may 
need to look at various things. Like horizontal (if you have a good column 
to split the data by) or vertical clustering (if you have extensive read 
operations, you will need to replicate your data from the master to slave 
servers and read it off from there).

Make sure you can split your databases functionally, if needed, and do not 
lock yourself in with your application. Functional splitting, of course, 
brings other problems like multi phase commits etc. Depending on your 
application, what you find with most large systems is the lack of foreign 
keys (foreign keys are something for development, not for production). Also 
the lack of triggers and other tricks (these are your DBA domain only, if at 
all needed). Other techniques may range from preaggregation, 
denormalization, archiving, caching (if you're on PHP, memcache might be a 
good solution for you) etc.

PHP file based sessions suprisingly perform quite well. If you get multiple 
frontends, depending on the load balancing method, you will need to start 
storing sessions in memcache, look at something Zend or other people have to 
offer or use some other approach, like write your own session server.

One thing to be aware of. Rebuilding from scratch and other major 
refactorings can be a disaster time-wise for some systems. The risk goes up 
by the number of manyears of code and thereby by the amount of maturation 
that has gone into it. The key usually is to refactor very little pieces on 
by one when you have many manyears of code to work with.

Know what you are building and for which requirements. Lots of the things 
I've mentioned above may be overkill for you and there is no point in 
jumping over your shadow to design 10 years in advance - you cannot predict 
your future. 10 years from now the database may not be your problem, but it 
may be the complexity of the underlying application, which gets even more 
complex with the overdesigned database. It may well be that you will have to 
add mainteinance points into the system if it is not your in-house system, 
which you are maintaining anyway. The key is in striking the balance, 
keeping simplicity and recognizing critical points, when something may need 
to be changed.

Cheers,
Lauri

----- Original Message ----- 
From: "Rick den Haan" <rick.denhaan at gmail.com>
To: <thelist at lists.evolt.org>
Sent: Wednesday, May 10, 2006 7:29 PM
Subject: Re: [thelist] redesigning a huge database

> John,
>
> Spot on on the requirements.
>
> I know that 700,000 rows can be considered moderate in some environments,
> but we don't usually build applications that required such large 
> databases.
> So for us, this is huge.
>
> The decision has already been made that the app will be completely rebuilt
> from scratch. There's bound to be some things in the old app we can 
> re-use,
> but we're not counting too heavily on it.
>
> Rick.
> -- 
>
> * * Please support the community that supports you.  * *
> http://evolt.org/help_support_evolt/
>
> For unsubscribe and other options, including the Tip Harvester
> and archives of thelist go to: http://lists.evolt.org
> Workers of the Web, evolt ! 




More information about the thelist mailing list