[thelist] Phone Number Word Permutation Algorithm

Matt Warden mwarden at gmail.com
Tue Dec 2 11:41:26 CST 2008


On Tue, Dec 2, 2008 at 7:54 AM, Fred Jones <fredthejonester at gmail.com> wrote:
> etc. For a 7 digit number, that can be between 5 and 16 K combinations

Your biggest challenge is reducing the search space. You can eliminate
combinations that have no vowel, for example. Also, while I could not
find out immediately with a google search, I do know there are
algorithms out there that identify strings of characters that could
possibly be English words. If you whittle your search space down to
these only, I think you will have a manageable number.

Secondly, your data structure isn't ideal. You have to perform a full
table-scan on every row in order to determine whether it matches
anything. In most of your example, you will match nothing, which means
you will check every single row in the table X times, where X is the
number of combinations you have that doe not match an English word. Is
"INSTR" really what you want? So "and" would match "candy" in your
database? That does not seem to be what you want. I would think you'd
want an exact match or search on "word%", where % is the wildcard. In
both cases, your search would use an index and be much faster. Also,
of course, get rid of that condition that restricts the search to
words greater than two characters.

I would also suggest looking into chunking these database calls, as
you are possibly incurring network and other overhead for each SQL
call. Can you pass an array to a stored procedure?

Good luck,

-- 
Matt Warden
Cincinnati, OH, USA
http://mattwarden.com


This email proudly and graciously contributes to entropy.



More information about the thelist mailing list