[thelist] RE: writing a productive search query

Brian Cummiskey Brian at hondaswap.com
Tue Mar 1 09:52:03 CST 2005


Hi John.  Thanks for the greatly detailed response.

Brooking, John wrote:

> 
>   I'm not sure exactly what you mean by "better" and "not overkill". Are
> you concerned with too many false matches being returned, or slow
> response time? In either case, here are some thoughts which might have
> bearing on both.

I'm concerned with a lot of things here.

First, I want to make sure that I DO grab the caller if he does in fact 
exist in the DB.  There are several concerns I have with this:

- Women may get married/change their last names.  I don't want to 
duplicate records.  Therefore, going by last name alone is not concise. 
  I'm not keeping track of sex of the individual, so this method is 
fubar'ed.
- People move.  The name may be the same, but the address, city, state, 
and/or phone number may be changed.  If that's the case, i still want to 
be able to pull up the old record with their old address and be able to 
update it.
- Phone numbers change.  Some people may even give us a cell phone as a 
primary contact number, and those can change even easier than a land line.
- Spellings.  Our agents will be trained to ask/verify/repeat back the 
sleepings of all names, but there's still a margin of error in any data 
entry screen.
- Titles.  The firs time a customer calls, he may ID himself as "John 
Smith Junior".  One agent may type in JR, one, J.R., and another the 
full word.  I can see this being a problem as well.

So, I need to be able to pull both the address matches, the name 
matches, and possibly the phone matches.  Our agents will be trained to 
look for the most likley record and ask the caller "is this your old 
address" to verify that the record they intend to use is in fact the 
same person, or someone else.  That way, as long as the name is close, i 
can still find the record based on the city/state/phone fields.

>   If you are concerned that all these LIKE clauses might be inefficient,
> I share your concern. 

Exactly.  Will it work- sure.  But it may also return 5,000 results as 
the database gets larger by the day.  Phone/Talk time is money, so I 
need to get the top 10-ish matching results and let the agent decide 
which seems most likely based on those results.


> My non-DBA impression of how any database would
> handle this query is that it would go through the entire table row by
> row (that's called a "full table scan") <snip>

Yup.  I'm familiar with the method of searching.  I agree its highly 
resource intensive.  I'm not too concerned with that, as we have a 
pretty good sql server (2k3, 2gb ram, dual xenon's, etc).

> It will, for example, not find "William Smythe"
> if the phone rep enters "Bill Smith". To get around this, and in the
> process improve performance, you could apply some kind of hashing for
> names. 

I was hoping that this would be taken care of with the address lookup as 
well.  If the caller ID's himself as Bill, chances are, he ID'ed him 
self as Bill the first time he called.  The database is starting as 
blank, and will be added only by the phone reps when they speak to a 
customer.  The first couple months will probably be 99% new entries. 
But i do see your point here.  Some people may use their more 
professional name on the phone when they are in a mood for 
professionalism, vs a nickname when they just have a simple question, or 
something like that.

I'm hoping that the hash will not be needed.  Quite frankly, i don't 
have the time to spend on that anyway.

> If these techniques increase your name matching accuracy sufficiently,
> maybe you can then simplify your query to checking each field
> individually, with AND between each, rather than the complex set of
> combinations you have so far:
>   
	<snip>
>   
>   Note that I'm assuming that all fields will be filled in. If they
> might not be, just surround each "AND" line with a conditional asking if
> the parameter is not blank.

All fields will be required on the entry form.  The problem with using 
AND's is that if the caller moved, for example, their phone number and 
name won't match, and thus, the search query won't find their record. 
The agent would then create a new record, and the database gets doubled up.

> Of course you should store the phone number without punctuation, and
> strip out any that that the phone rep uses in the parameter. 

yes, phone numebrs will be a 10-digit field only, checked with a regex 
before the form is even submitted to the search script.

> Lastly, depending on your database, make sure upper/lowercase will not
> be a problem. 

I'm using M$ SQL 2000, so upper/lower doesn't matter so much.... I hope.

geesh..  it's even more complex than i thought.  :)

I appreciate the time John.  Without going to the Hash setup, and just 
using a query, do you think its possible with a certain amount of 
accuracy to get at LEAST the correct result, and a couple other 
possibilities?


Thanks,
Brian



More information about the thelist mailing list