[thelist] Very Large MySQL Table

Jason Handby jason.handby at corestar.co.uk
Tue Feb 10 03:48:13 CST 2009


Hi Fred,


> We don't have a dedicated server for this but his web account does
> have disk space for 6Gs, but I am wondering if we will be able to run
> searches. His idea was to break it up into 8 pieces, and then to run
> each search 8 times, against just that one eighth of the full list.
> 
> Anyone have any better idea?


Disclaimer: I'm making this up :-)


In order to find all occurrences of the substring 'thelist' in all
registered domain names, MySQL needs to do a table scan of all 185
million records; and checking each record involves looking for a
substring in a string. That's going to take a long time!

Breaking it into 8 tables doesn't change that. You will still need to
search all the records in all 8 of the tables to get an answer.

In order to speed it up, you need to find some way to cheaply rule out
some portion of the 185 million records before you even get to the
string-scanning part. Probably this means exploiting indexes somehow.

Here's one idea. You could add 26 SMALLINT columns to your table, each
with its own index. The first one for the number of 'a's, the second one
for the number of 'b's... and so on. Then if you're searching for the
phrase 'thelist' in a domain name, you know you only want records

...WHERE t_count >= 2 AND h_count >= 1
	AND e_count >= 1 AND l_count >= 1
	AND i_count >= 1 AND s_count >= 1
	AND domainname LIKE '%thelist%'

and so on. I haven't of course tried this, but I bet it's quicker than
searching all 185 million strings.


You never know, it might work!



Jason





More information about the thelist mailing list