[thesite] friday is a good day..

rudy r937 at interlog.com
Fri May 18 11:31:07 CDT 2001


> We need to rewrite the query so that it doesn't require the nested one.

i think you will find the alternative equally slow, because the alternative
is a LEFT JOIN with a GROUP BY on all the columns other than the Count(*)

i think the problem is the sheer volume of calls to the nested (correlated)
query, due to the number of articles returned

  SELECT
    content.contentid,
    content.keyphrase,
    content.contentname,
    content.synopsis,
    content.datemod,
    content.replies,
  WHERE
      content.signoff = 1
      AND content.categoryid <> 23
      AND content.categoryid <> 26
      AND content.categoryid <> 9741

i.e. all the active articles not in those three categories

so far so good, except for the fact that it will return about a thousand
articles

disclaimer:  i don't know what the page is doing, whether it is caching the
results and/or how it is stepping through the "next 10 pages" so it might
be legitimate to go get all the articles -- but if the "next" type links do
*not* use cache and instead go and query the database again, then the above
is truly wasteful, and we should consider putting a date check on it to get
only the most recent articles or something

okay, now join to the user and category records

these joins go "from the many to the one" in the 1-to-many user-article and
category-article relationships, therefore these additional joins do not
increase the number of records in the result set --

  SELECT
    content.[columns],
    users.[columns],
    categorys.[columns]
  FROM
    content,
    users,
    categorys
  WHERE
      [content filters]
      AND content.userid = users.userid
      AND content.categoryid = categorys.categoryid

in anticipation of needing outer joins on this and other queries throughout
the site, we should try rewriting them using the sql92 syntax (or whatever
it's called) --

  SELECT
    content.[columns],
    users.[columns],
    categorys.[columns]
  FROM
    content
    INNER JOIN users
      ON content.userid = users.userid
    INNER JOIN categorys
      ON content.categoryid = categorys.categoryid
  WHERE
      [content filters]

now the way to add the nested query to the comments table would be as
follows --

  SELECT
    content.[columns],
    users.[columns],
    categorys.[columns],
    Count(*) AS newcomments
  FROM
    content
    INNER JOIN users
      ON content.userid = users.userid
    INNER JOIN categorys
      ON content.categoryid = categorys.categoryid
    LEFT JOIN replies
      ON content.contentid = replies.contentid
  WHERE
      [content filters]
    AND replies.datemod > ''
 GROUP BY
    content.[columns],
    users.[columns],
    categorys.[columns]

this is now going "from the one to the many" (if you know what i mean),
consequently you need the GROUP BY in order to keep the result set to one
row per article

i would be interested to see if oracle actually does better on this query
than the one using the nested correlated subquery, but my guess is it will
be just as slow

another alternative, and one that i personally think has a chance of maybe
performing really well, is to get the number of article comments in a
UNION -- but this requires that the page that is handling the result set be
changed, in order to handle the "extra rows" in the result set --

  SELECT
    content.contentid,
    content.keyphrase,
    content.contentname,
    content.synopsis,
    content.datemod,
    content.replies,
    users.[columns],
    categorys.[columns]
  FROM
    content
    INNER JOIN users
      ON content.userid = users.userid
    INNER JOIN categorys
      ON content.categoryid = categorys.categoryid
  WHERE
      [content filters]
 UNION ALL
  SELECT
    content.contentid,
    '          ',
    'ZZZZZZZZZZZ',
    '          ',
    content.datemod,
    Count(replies.contentid)
    , [more blanks or zeroes]
  FROM
    content
    LEFT JOIN replies
      ON content.contentid = replies.contentid
  WHERE
      [content filters]
  GROUP BY
    content.contentid,
    content.datemod
  ORDER BY
    content.datemod DESC,
    content.contentid DESC,
    content.contentname DESC

with this approach, the page would use CFOUTPUT GROUP=  and trap the
ZZZZZZZ row, which would come ahead of the "real" content row

the only advantage to doing this is that the GROUP BY is not performed in
the midst of a 4-way table join, rather only a 2-way join, which the
database engine might find a bit easier to do

however, my first attempt at query optimization would be to restrict the
query to returning no more rows than the page will display, if indeed the
"next 10 articles" feature does go back to the database...

does oracle support anything like microsoft's SELECT TOP or mysql's LIMIT?



rudy





More information about the thesite mailing list