[thelist] MySQL select random row [WAS: MySQL queries and AUTO_INCREMENT]

Max Schwanekamp lists at neptunewebworks.com
Fri Apr 7 14:22:30 CDT 2006


Mattias Thorslund wrote:
> But it looks like some rows would be more likely to be chosen than
> others. Consider if the IDs are [1, 2, 3, 100]. Then, picking a random
> number between 1 and 100 will result displaying the row with ID 100 for
> 97% of the time.

Yup, totally true.  I just mentioned that weakness to Joshua on the 
phone earlier, before I saw your post.  If you want a closer 
approximation of true randomness, you'll want to use a separate sequence 
field.  Still, if the point is simply to vary the output on a 
more-or-less random basis, and it is reasonable to assume that we'll 
have a table with may more insertions than deletions, this method is 
simpler than having 4 tables to maintain solely for the purpose of 
generating a random image.

> What's the use of point 3? The (second) point 2 should return a row if
> there are rows in the table.

Also true!  I was in a rush and added the minid bit just before sending; 
should have removed #3.

> If there are no rows in the table, point 1 will return NULL in both
> firstrow and lastrow, which is where you should raise the error.

Good point.

> I agree with John Brooking: Use a separate sequence field, and update
> it. 

That makes sense in most cases, but this may add overhead to the 
maintenance, since now all inserts to that table requires a routine that 
under MyISAM locks the table, inserts, updates all rows, unlocks (or 
wraps it in a transaction under InnoDB).  It also couples the code to 
the table structure, which was partly what I was trying to avoid.  If we 
want to reuse a selectRandomRow method, the table must always have that 
sequence field, or fall back on some other technique... like this one or 
ORDER BY RAND() LIMIT 1.

> Or, if the number of records will be not too large, use ORDER BY
> RAND() LIMIT 1.

I agree.  But there should be some automated way to get alerted when a 
query takes an inordinately long time, as ORDER BY RAND() is expensive 
for big tables.

OK, so I googled around a bit just now, and these guys have a good 
discussion:
http://akinas.com/pages/en/blog/mysql_random_row/

The last one is an interesting solution.

-- 
Max Schwanekamp
http://www.neptunewebworks.com/



More information about the thelist mailing list