[thelist] Need help with difficult MySQL query

rudy r937 at interlog.com
Fri Jul 12 20:32:01 CDT 2002


hi simon

congrats for rolling your own cms

i understand your predicament, and have a couple of ideas

in your query,

  select entries.*, categories.*
    from entries, categories, ent_cat_lookup
   where entries.id = ent_cat_lookup.entryId
     and categories.shortName = ent_cat_lookup.categoryShortName
order by entries.dateAdded desc

presumable you have limit 0,19 on this, yes?  note that
this will be 20 result rows, including "duplicates" for
entries in more than one category, so actually there could
be fewer than 20 distinct entries

here are the alternatives as i see them

1. accept that occasional entries will be in more than one
category, and continue to return results using the above query

process the result set in a loop which is written to detect
the duplicate entries, collecting the multiple categories
for a given entry until it's gone past the last row for that
entry (this is prev/next logic which should be straightforward)

the downside is that large entries take up a lot of space in
the result set being passed back from the database, so this might
slow down the result set retrieval

but if you're talking about 20 rows, you would probably never
notice the performance hit

also, if your intention is to print 20 at a time, you should
probably go ahead and bring back 30 or so, and stop printing
after you've done 20 distinct entries in that loop

2. break the query up into two steps

  select *
    from entries
order by dateAdded desc
   limit 0,19

pull out the entry ids into a comma-delimited list, $idlist
(i don't do php -- is that how you name a string variable?)
and do another query (note to john: you almost never
should do queries in a loop!)

  select entries.id, categories.*
    from entries
  inner join ent_cat_lookup
      on entries.id = ent_cat_lookup.entryId
  inner join categories
      on ent_cat_lookup.categoryShortName = categories.shortName
   where entries.id in ( $idlist )
order by entries.id

note that mysql supports the standard sql join syntax, which
is preferable because it's easier to write and to comprehend

the 2-step strategy eliminates duplication in the entries, and
ensures you get exactly 20 distinct entries

it's probably marginally slower because it requires two calls to
the database (not always a bad thing, and often unavoidable in mysql
when a subquery would have been involved) but that might be offset by
faster processing of the entries query

however, it places an extra burden on your programming, because now
you have to match entries to their (multiple) categories yourself

this might not seem insurmountable at first, but will have greater
and greater impact if the queries change

for instance, suppose you wanted to bring back (multiple) comments
with each entry?

luckily, you can still do this in  two steps -- one to get entries plus
comments, and the second to get categories per entry, but the matching
logic would get more complex

the one-step process would at this point be undesirable -- the duplication
involved in entries plus multiple comments plus multiple categories is not
something i'd want to ship over from the database in one result set


there are other solutions, involving bastardizing the lovely (and
quite proper) many-to-many table design you have there, so let's
not look at those

hoping this helps

continued success with your blog


rudy






More information about the thelist mailing list