[thelist] MySQL select random row [WAS: MySQL queries and AUTO_INCREMENT]
Max Schwanekamp
lists at neptunewebworks.com
Fri Apr 7 14:22:30 CDT 2006
Mattias Thorslund wrote:
> But it looks like some rows would be more likely to be chosen than
> others. Consider if the IDs are [1, 2, 3, 100]. Then, picking a random
> number between 1 and 100 will result displaying the row with ID 100 for
> 97% of the time.
Yup, totally true. I just mentioned that weakness to Joshua on the
phone earlier, before I saw your post. If you want a closer
approximation of true randomness, you'll want to use a separate sequence
field. Still, if the point is simply to vary the output on a
more-or-less random basis, and it is reasonable to assume that we'll
have a table with may more insertions than deletions, this method is
simpler than having 4 tables to maintain solely for the purpose of
generating a random image.
> What's the use of point 3? The (second) point 2 should return a row if
> there are rows in the table.
Also true! I was in a rush and added the minid bit just before sending;
should have removed #3.
> If there are no rows in the table, point 1 will return NULL in both
> firstrow and lastrow, which is where you should raise the error.
Good point.
> I agree with John Brooking: Use a separate sequence field, and update
> it.
That makes sense in most cases, but this may add overhead to the
maintenance, since now all inserts to that table requires a routine that
under MyISAM locks the table, inserts, updates all rows, unlocks (or
wraps it in a transaction under InnoDB). It also couples the code to
the table structure, which was partly what I was trying to avoid. If we
want to reuse a selectRandomRow method, the table must always have that
sequence field, or fall back on some other technique... like this one or
ORDER BY RAND() LIMIT 1.
> Or, if the number of records will be not too large, use ORDER BY
> RAND() LIMIT 1.
I agree. But there should be some automated way to get alerted when a
query takes an inordinately long time, as ORDER BY RAND() is expensive
for big tables.
OK, so I googled around a bit just now, and these guys have a good
discussion:
http://akinas.com/pages/en/blog/mysql_random_row/
The last one is an interesting solution.
--
Max Schwanekamp
http://www.neptunewebworks.com/
More information about the thelist
mailing list