[thelist] Phone Number Word Permutation Algorithm

Fred Jones fredthejonester at gmail.com
Tue Dec 2 06:54:56 CST 2008


This site http://numspell.com/ accepts phone numbers as input and
returns mnemonic combinations for that number. I am trying to create
the same functionality for someone, in order to extend it. I have a
database table with 173K records, each of which is a valid English
word (I downloaded this list).

What I am doing is first recursively generating all permutations of
the number, so for 23, for example we have:

23
a3
b3
c3
2d
2e
2f
a3
ad

etc. For a 7 digit number, that can be between 5 and 16 K combinations
(some digits have 3 letters and some have 4). Then I run a query for
each permutation to see if there is a valid word in there. This raw
SQL method, however, is way too much computation. I think this is the
wrong approach. This is the SQL I am using:

SELECT INSTR('%s',word) as location,word FROM words WHERE
                      INSTR('%s',word) > 0 and LENGTH(word) > 2;"

where %s is replaced by the permutation.

I realize now I can remove perhaps 40 or 60K words from the dictionary
because they are greater than 10 or 11 characters and therefore
somewhat irrelevant. But even a 9 character word would be relevant
because you can always just tack on 2 extra characters--your phone
call will still get through. :)

One way I could improve this would be to only search for the letters
in the permutation, and not the digits, meaning if my permutation is
"11cab11" then instead of using INSTR in MySQL I could use PHP to
extract the 'cab' part and just search on that to see if it has a
precise string match with a word. Then also I could note that I
searched once for that word and so if I find it again in another
permutation, there's no need to search again. I believe that would
reduce the overall computations by a tremendous factor.

Anyway, does anyone have any ideas of a better way to go about this?

Thanks!



More information about the thelist mailing list