[thelist] Impact of LIKE in sql
rudy
Rudy_Limeback at maritimelife.ca
Thu Dec 14 13:52:15 CST 2000
> What is the impact of the number of characters
> in a LIKE clause in most databases.
hi joshua
you probably won't notice the difference
the key to performance is minimizing the number of rows the database has
to look at
generally speaking, the longer the string, the fewer rows should match, so
go ahead and use a longer string if possible
like i said, you probably won't notice, since the string comparison is
done in cpu anyway
> If you have the opportunity to break the data down
> a little bit and perform a shorter LIKE comparison
> and a strict comparison, does the shortened LIKE string
> compensate for the now 2 conditionals?
no
but before i explain why, let's do an example
suppose you are looking up Stroumboulopoulos (not a made-up name, the
guy's a local VJ) but you can't remember how to spell it
so you go LIKE 'Str%'
if you know more than that, you might go LIKE 'Stroumb%'
both of these are going to be approximately the same in performance --
either excellent or terrible
if you were looking up this name in the phone book, you would find it in
approximately the same amount of time, whether you went LIKE 'Str%' or
LIKE 'Stroumb%' -- relatively quickly, because the phone book is sequenced
in surname order, and you would start your search by flipping over to the
S pages...
maybe with the longer string, you could jump to the Stroumba names and
quit searching when you got to the Stroumbz names, so you might be
marginally faster with a longer string
but compare that situation to this --
i can't remember what it's called, but there's this directory in our local
public library, thousands and thousands of pages, must weigh at least ten
kilograms, where you can look up a street address and find out who lives
there -- the book is printed in street address sequence
now if you use this address directory to look for all the
Stroumboulopouloses, never mind if you know how to spell the name exactly,
you aren't going to finish any time soon because you'll have to go through
the entire book
so, indexes are good, because the database can access an index very
quickly by evaluating the condition against the *index* name field, and
not against the actual *table row* name field
as the database evaluates which index entries qualify, it is building up a
list of "row addresses" in memory, and once it's finished searching the
index, it uses those row addresses to go out to the disk and get the
actual rows off the table (which it can then just pipe over to the result
table, without having to confirm that the name field in the row meets the
original condition)
with me so far?
okay, what if there is no index on the name field?
then the database has to read all the rows!!
in that case, it will take exactly the same amount of time, whether you go
LIKE 'Str%' or LIKE 'Stroumb%' -- i.e. practically forever
so the rule of thumb for table design is, if you are going to search on a
column, create an index on that column
"search on a column" is actually one of several criteria for creating an
index
<tip>
consider creating an index if the column is used in
- the WHERE clause (i.e. to search on)
- the WHERE clause or ON conditions (to join tables)
- the GROUP BY or ORDER BY clauses
</tip>
one caveat for creating indexes is whether in fact they would actually be
used
suppose there *is* an index on surname, but you wanted to go LIKE
'%dóttir'
now the database has to decide whether it's even worth bothering to read
the surname index, because it would have to read *all of the index
entries*
sure, not all index entries will qualify, but it then has to read a
potentially large number of rows off the disk using row addresses -- which
is not as fast as reading blocks of rows off the disk from front to back
like it can when it's scanning an entire table
the database has no way of knowing whether LIKE '%dóttir' is going to
qualify a large number of rows or not
furthermore, it's a big demand on memory, keeping track of all the "row
addresses" that later (as soon as the index search is finished) have to be
read off the actual table
so depending on a few factors like how many rows are in the table, the
database sometimes just ignores the index
<tip>
be aware that LIKE strings that *start* with a wildcard usually induce the
database to do an entire table scan, instead of using an index defined on
that column
</tip>
which doesn't mean that the index is no useful, it will be used on any
query that does not start with a wildcard
okay, now consider your second question, does it matter if you try to help
things out by adding another condition?
the answer is yes, sorta -- but only do it if you *have* to do it
take the case of our pal the VJ, who we happen to know is a guy
let's suppose that there *is* an index on surname, and then run a query
like this
SELECT foo
FROM yourTable
WHERE surname LIKE 'Stroumb%'
AND gender = 'M'
to return rows for this request efficiently, the database now has to make
a more complex decision than just whether to use the available index on
surname
can it still use that index? yes, and most likely it will, because it can
defer the evaluation of gender='M' until it has actually started
retrieving rows from the table that already qualified via the surname
index
so this is actually marginally slower, i.e. you haven't helped it at all
did you *have* to add the second condition? not really -- i mean, without
it, you would get back a list of all the Stroumboulopouloses, both guys
and gals, and you could let your eye sort of wander down the list to
visually check for George
still, in general, the more conditions you supply, the fewer rows have to
be returned, so even though the *search* is marginally slower, the
*overall* performance would probably be faster -- it's faster to return
only a few rows of gender='M' Stroumboulopouloses than to return the rows
of all the Stroumboulopouloses
okay, and what if you accidentally code the query as
SELECT foo
FROM yourTable
WHERE surname LIKE 'Stroumb%'
OR gender = 'M'
in a case like this, the database would probably not use the index,
because it knows, from the second condition, that it's going to have to
look at all the rows in the table anyway (this example returns all the
guys plus any female Stroumb%'s, which is why i said "accidentally" -- it
isn't a realistic query, it just illustrates what happens even when an
index is available)
what if you had an index on gender too? good question, but i'm not sure i
should answer it, because i think i've already gone on a bit long on this
post already...
holler if you want more info
i love this stuff
rudy.ca
More information about the thelist
mailing list