[thelist] Impact of LIKE in sql

rudy Rudy_Limeback at maritimelife.ca
Thu Dec 14 13:52:15 CST 2000


> What is the impact of the number of characters 
> in a LIKE clause in most databases. 

hi joshua

you probably won't notice the difference

the key to performance is minimizing the number of rows the database has 
to look at

generally speaking, the longer the string, the fewer rows should match, so 
go ahead and use a longer string if possible 

like i said, you probably won't notice, since the string comparison is 
done in cpu anyway


> If you have the opportunity to break the data down 
> a little bit and perform a shorter LIKE comparison 
> and a strict comparison, does the shortened LIKE string 
> compensate for the now 2 conditionals?

no

but before i explain why, let's do an example

suppose you are looking up Stroumboulopoulos (not a made-up name, the 
guy's a local VJ) but you can't remember how to spell it

so you go LIKE 'Str%'

if you know more than that, you might go LIKE 'Stroumb%'

both of these are going to be approximately the same in performance -- 
either excellent or terrible

if you were looking up this name in the phone book, you would find it in 
approximately the same amount of time, whether you went LIKE 'Str%' or 
LIKE 'Stroumb%' -- relatively quickly, because the phone book is sequenced 
in surname order, and you would start your search by flipping over to the 
S pages...

maybe with the longer string, you could jump to the Stroumba names and 
quit searching when you got to the Stroumbz names, so you might be 
marginally faster with a longer string

but compare that situation to this --

i can't remember what it's called, but there's this directory in our local 
public library, thousands and thousands of pages, must weigh at least ten 
kilograms, where you can look up a street address and find out who lives 
there -- the book is printed in street address sequence

now if you use this address directory to look for all the 
Stroumboulopouloses, never mind if you know how to spell the name exactly, 
you aren't going to finish any time soon because you'll have to go through 
the entire book

so, indexes are good, because the database can access an index very 
quickly by evaluating the condition against the *index* name field, and 
not against the actual *table row* name field

as the database evaluates which index entries qualify, it is building up a 
list of "row addresses" in memory, and once it's finished searching the 
index, it uses those row addresses to go out to the disk and get the 
actual rows off the table (which it can then just pipe over to the result 
table, without having to confirm that the name field in the row meets the 
original condition)

with me so far?

okay, what if there is no index on the name field? 

then the database has to read all the rows!!

in that case, it will take exactly the same amount of time, whether you go 
LIKE 'Str%' or LIKE 'Stroumb%' -- i.e. practically forever

so the rule of thumb for table design is, if you are going to search on a 
column, create an index on that column

"search on a column" is actually one of several criteria for creating an 
index 

<tip>
consider creating an index if the column is used in
   - the WHERE clause (i.e. to search on)
   - the WHERE clause or ON conditions (to join tables)
   - the GROUP BY or ORDER BY clauses
</tip>


one caveat for creating indexes is whether in fact they would actually be 
used

suppose there *is* an index on surname, but you wanted to go LIKE 
'%dóttir'

now the database has to decide whether it's even worth bothering to read 
the surname index, because it would have to read *all of the index 
entries* 

sure, not all index entries will qualify, but it then has to read a 
potentially large number of rows off the disk using row addresses -- which 
is not as fast as reading blocks of rows off the disk from front to back 
like it can when it's scanning an entire table

the database has no way of knowing whether LIKE '%dóttir' is going to 
qualify a large number of rows or not

furthermore, it's a big demand on memory, keeping track of all the "row 
addresses" that later (as soon as the index search is finished) have to be 
read off the actual table

so depending on a few factors like how many rows are in the table, the 
database sometimes just ignores the index

<tip>
be aware that LIKE strings that *start* with a wildcard usually induce the 
database to do an entire table scan, instead of using an index defined on 
that column
</tip>

which doesn't mean that the index is no useful, it will be used on any 
query that does not start with a wildcard

okay, now consider your second question, does it matter if you try to help 
things out by adding another condition?

the answer is yes, sorta -- but only do it if you *have* to do it

take the case of our pal the VJ, who we happen to know is a guy

let's suppose that there *is* an index on surname, and then run a query 
like this

   SELECT foo
     FROM yourTable
    WHERE surname LIKE 'Stroumb%'
      AND gender = 'M'

to return rows for this request efficiently, the database now has to make 
a more complex decision than just whether to use the available index on 
surname

can it still use that index?  yes, and most likely it will, because it can 
defer the evaluation of gender='M' until it has actually started 
retrieving rows from the table that already qualified via the surname 
index

so this is actually marginally slower, i.e. you haven't helped it at all

did you *have* to add the second condition?  not really -- i mean, without 
it, you would get back a list of all the Stroumboulopouloses, both guys 
and gals, and you could let your eye sort of wander down the list to 
visually check for George

still, in general, the more conditions you supply, the fewer rows have to 
be returned, so even though the *search* is marginally slower, the 
*overall* performance would probably be faster -- it's faster to return 
only a few rows of gender='M' Stroumboulopouloses than to return the rows 
of all the Stroumboulopouloses

okay, and what if you accidentally code the query as

   SELECT foo
     FROM yourTable
    WHERE surname LIKE 'Stroumb%'
       OR gender = 'M'

in a case like this, the database would probably not use the index, 
because it knows, from the second condition, that it's going to have to 
look at all the rows in the table anyway (this example returns all the 
guys plus any female Stroumb%'s, which is why i said "accidentally" -- it 
isn't a realistic query, it just illustrates what happens even when an 
index is available)

what if you had an index on gender too?  good question, but i'm not sure i 
should answer it, because i think i've already gone on a bit long on this 
post already...


holler if you want more info

i love this stuff


rudy.ca





More information about the thelist mailing list