[thelist] redesigning a huge database

John Hicks johnlist at gulfbridge.net
Wed May 10 10:58:43 CDT 2006


Rick den Haan wrote:
> Hi All,
> 
> Our company has been approached to rebuild a rather large and often-used
> webapplication, and I've been charged with putting a plan on paper before
> any actual work is done.
> 
> This application is currently designed to work for dutch customers only, and
> all text is hard-coded into the php files. However, our client wants to
> expand further out into Europe, which means the application has to be
> modified so one or more (preferable one) files with language-definitions can
> be used. That's not the reason we're rebuilding it from scratch, though.
> 
> Currently, *everything* is stored in the database. From the moment you open
> the app, PHP sessions are stored, and every movement and action up until you
> close your browser is logged in the database. Don't ask me why, they
> explained once and it made sense, but I forgot. The trouble is, it's been up
> for roughly two years now, and there are tables in there with over 400,000
> records. There's another one with more than 700,000 records. And this is the
> Netherlands, only.
> 
> Part of the application depends on a SELECT DISTINCT mysql-query, and that
> takes far too much time right now (up to the point where the app hangs). So
> the database desperately needs to be redesigned for expansion into Europe.
> 
> Has anybody had any experience redesigning databases of this size, and can
> you give me pointers on what to look out for? One of the guys here said that
> perhaps it's a good idea to have multiple databases, which is indeed a
> possiblity, as it's hosted on a dedicated server we have full control over,
> but is that a good idea?
> 
> Rick.

I'm not an expert in redesigning systems with large databases, but here 
are a few ideas:

It looks like your requirements are:
--to add internationalization
--to optimize or tune the database and its queries
--to establish a system for rotating out old data

Internationalization is a whole subject in itself but one basic approach 
is to establish a system of templates and a simple content management 
system. The rest of the system can remain as is, but when you  are ready 
to start writing a page, you select a template and fill in variables 
with values from the CMS. (The cleanest approach is to have a single 
template for all languages and place all language elements in the CMS.)

Optimization. Identify problem queries (like the 'select distinct' you 
mentioned) and run 'explain' on them. Make sure you've established all 
the indexes you need. Check out 'analyze table'. This is a task that can 
be delegated since it requires only DB expertise. (In many environments 
a table of 700,000 rows would not be considered large!)

Archiving. Identify the tables that are large and growing. Simple 
history logs can probably be truncated at a certain point. Transactional 
history (like orders) can be moved periodically to a history table.

Others on the list can probably add much that I don't know about each of 
these points.

One thing that you didn't make clear was whether a decision has already 
been made on the scope of the project (redesigning and replacing v. 
tweaking, revamping, optimizing etc.).

Hope this is of some help.

--J



More information about the thelist mailing list