[thelist] RE: writing a productive search query

Brooking, John John.Brooking at sappi.com
Tue Mar 1 08:10:22 CST 2005


Brian Cummiskey said:
> I've been working on a customer search most of the morning, and i 
> can't seem to come up with anything better.

(asp/sql code omitted for brevity)

> My target here is for our phone rep to fill out a form on the 
> previous page with the caller fn/ln/state/phone and then, select 
> from the databse the most likely customer based on the criteria.
>
> How can i adjust the query so that it is inclusive of all, but not 
> overkill once the database starts to grow (and it will by the hundreds

> per day)?

Brian,

  I'm not sure exactly what you mean by "better" and "not overkill". Are
you concerned with too many false matches being returned, or slow
response time? In either case, here are some thoughts which might have
bearing on both.
  
  If you are concerned that all these LIKE clauses might be inefficient,
I share your concern. My non-DBA impression of how any database would
handle this query is that it would go through the entire table row by
row (that's called a "full table scan"), doing wildcard text comparisons
on each row. I would think that that could be pretty slow if the DB gets
huge. You might improve it with caching and other tricks which your DBA
could help you with.
  
  But even then, you might not get all the results you are looking for,
because your query is pretty much looking for exact text, albeit
anywhere in the field. It will, for example, not find "William Smythe"
if the phone rep enters "Bill Smith". To get around this, and in the
process improve performance, you could apply some kind of hashing for
names. I wrote a mailing list database in the mid-90's, and used this
kind of technique for both first and last name. For last names, there
are a variety of algorithms which derive a common hash value for similar
sounding last names. The most well known of those (and public domain) is
Soundex, which you can google for. Some databases even have this as a
built-in function. I have also seen reference to others which claim to
be better, but some are also proprietary. For first names, I created my
own "first name equivalency" table, assigning a common numeric value to
variations of the same name, such as "William" and "Bill". I suppose
such a table may be obtained commercially, although I don't know where
or for how much. I can send you mine if you want.
  
  If you do something like this for the name, then you can make fields
called "LastNameHash" and "FirstNameID" on your table, populate them
when the record is created, and then search on them instead of the raw
name fields. You could even index them. Then when the phone rep
searched, also create the corresponding values for the values s/he
entered, and you can do exact compares.
  
  If these techniques increase your name matching accuracy sufficiently,
maybe you can then simplify your query to checking each field
individually, with AND between each, rather than the complex set of
combinations you have so far:
  
      sLastNameHash = Soundex(request("caller_last_name"))
      iFirstNameID = FirstNameID(request("caller_first_name"))
      
      sSQL = sSQL & " LastNameHash = '" & sLastNameHash & "' "
      sSQL = sSQL & " AND FirstNameID = '" & iFirstNameID & "' "
      sSQL = sSQL & " AND caller_phone_number LIKE '%" _
                  & request("caller_phone_number") & "%' "
      sSQL = sSQL & " AND caller_state = '" & request("caller_state")
  
  Note that I'm assuming that all fields will be filled in. If they
might not be, just surround each "AND" line with a conditional asking if
the parameter is not blank.
  
  Using AND instead of OR should ensure (I think) that if indexes are
available to be used on the name hash fields, then it will match those
results first, and then the LIKE clause for the phone can just scan the
results that are left. Using OR's would force it to do the full table
scan regardless of indexes.
  
  Of course you should store the phone number without punctuation, and
strip out any that that the phone rep uses in the parameter. Maybe you
could even get away with only matching on the last 4 or 7 digits, since
you are also restricting the match by the other fields now. And if you
are doing that, just store the last 4 or 7 digits as a separate field,
so you can again do an exact match and avoid the wildcarding altogether.
  
  Lastly, depending on your database, make sure upper/lowercase will not
be a problem. If you are using the hash values for names, then the only
field you need to worry about is the state/province. You could also add
city back into your search field list, being careful about case there as
well.
  
  Hope you find some of this useful or at least thought-provoking!
  
- John
-- 


This message may contain information which is private, privileged or confidential and is intended solely for the use of the individual or entity named in the message. If you are not the intended recipient of this message, please notify the sender thereof and destroy / delete the message. Neither the sender nor Sappi Limited (including its subsidiaries and associated companies) shall incur any liability resulting directly or indirectly from accessing any of the attached files which may contain a virus or the like.



More information about the thelist mailing list