[thelist] SQL query help

Joel Lieberman joel_lieberman at yahoo.com
Wed Jan 2 21:45:57 CST 2002


Please forgive my butting in...

I agree that SQL complexity should not necessarily
drive data model design, but by the same token,
denormalization is also a recognized strategy to
improve performance in some situations.

Whilst the factors of database platform type (SQL
Server, Oracle, Etc,), physical platform choice (Unix,
Win2000, Linux, Etc.), table size, query type (index
enabled, full table scan, etc.) all interact to affect
performance, a good rule of thumb should be that for
table size not exceeding 5 million records, properly
indexed, and avoiding full table scans, the average
query that returns 1000 - 10000 records should take no
longer than 3 - 5 seconds to complete on a decent
(moderate) modern platform.

If page size and query structure are causing questions
to be raised relating to performance, then unless the
database tables contain several million records, the
arguments should essentially be moot.

If performance is really a problem, then try using
surrogate (integer) keys rather than concatenated
column keys, and consider loading the tables into
memory at first query (pin the queries) to improve
performance.

All of this notwithstanding, theory is not always the
best predictor of performance.  Sometimes testing in
the actual environment is the best way to establish
which queries are offending and where bottlenecks are
occuring based on DB Engine optimization.

Bottom line:  if you only have a few tens of thousands
of records, queries should be essentially
instantaneous on any modern database platform
(assuming proper indexing and avoidance of repetitive
full table scans)

Joel Lieberman
--- Erik Mattheis <gozz at gozz.com> wrote:
> ><tip type="data modelling">
> >never let the avoidance of sql complexity drive
> your table design
> ></tip>
> >
> >i was biting my tongue trying not to comment on
> your poll design, as i,
> >too, could see several ways to improve it
> 
> I'm all ears ... I know there's ways I could be
> doing things better 
> in the SQL/DB arena.
> 
> >the sql i gave you was horribly complex (did it
> work, by the way?  it would
> >be nice to know, in case i was wrong, because i
> like to put out little
> >fires like that before they turn into big ones)
> 
> It gives the same error I was getting when trying to
> come up with 
> something myself - it's the most dreaded error I get
> because I never 
> understand why it happens:
> 
> [Microsoft][ODBC SQL Server Driver][SQL Server]
> Column 
> 'poll_options.option_id' is invalid in the select
> list because it is 
> not contained in either an aggregate function or the
> GROUP BY clause.
> 
> Here is your query again:
> 
>     select top 1 poll_options.option_id, count(*),
> poll_options.option_text
>         from poll_answers, poll_options
>         where poll_answers.poll_id =
> poll_options.poll_id
>             and poll_answers.poll_id =
>             (   select poll_id
>                    from polls
>                  where poll_active=0
>                      and poll_date =
>                       (  select max(poll_date)
>                                   from polls
>                                where poll_active=0 
> )      )
>        group by poll_answers.option_id
>       order by 2 desc
> 
> I can easily follow that except for the first and
> last lines:
> 
> What is the count(*) counting? I'm looking for the
> top 
> Count(poll_answers.option_id)
> The way it is here, what value will determine what
> the TOP result is?
> 
> The "order by 2 desc" - what is this doing?
> 
> 
> >but like my tip says, sql complexity avoidance is
> not the main design
> >criterion
> 
> This is my thinking ...I'm much more interested in
> keeping page 
> processing time down than keeping database size down
> - I've got tons 
> of disk space. And don't nested SELECTS like this
> significantly slow 
> down processing time? It seems a better option for
> the purpose at 
> hand to duplicate stuff if it will speed up page
> processing.
> 
> >if you are looking to restrict duplicates (e.g. the
> same IP number)
> >consider using a unique constraint, and let the
> database do it for you
> >
> >it would have to be a multi-column constraint (e.g.
> including poll number),
> >so make sure your database can handle that
> (sql/server can)
> 
> I can see that - but it would have to look at three
> things: poll_id, 
> the user's IP and the date. And again, page
> processing speed is the 
> most important thing other than making it slightly
> difficult to vote 
> in the same poll more than once an hour.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> __________________________________________
> - Erik Mattheis
> 
> (612) 377 2272
> http://goZz.com/
> 
> __________________________________________
> 
> -- 
> For unsubscribe and other options, including
> the Tip Harvester and archive of TheList go to:
> http://lists.evolt.org Workers of the Web, evolt ! 


__________________________________________________
Do You Yahoo!?
Send your FREE holiday greetings online!
http://greetings.yahoo.com




More information about the thelist mailing list