[thelist] Queue processing with a database table.

Bill Moseley moseley at hank.org
Sun Aug 29 08:04:24 CDT 2010


I have a table called "queue" that has three columns "filename", "state",
and "start_time".  Each file in the table must be processed and the state
can be "ready", "processing", or "complete".  The processing of the files is
done on a number of machines/processes in parallel, and files are processed
in batches of, say, 10.  I file must only be processed one time.

What I want is a way for each separate machine (well, each process) to grab
10 rows, set them as pending and know that no other process can grab the
same rows.

It's possible that a process might die, so the "start_time" is used to clear
up stale "processing" states.

My first pass was SELECT ... FOR UPDATE, but (at least in Postgresql) what
happens is if two processes do the "SELECT * from queue WHERE state =
'ready' limit 10 FOR UPDATE" then when the first COMMIT happens the second
transactions ends up returning zero rows.  Or to be clear, say there's 1000s
of rows, if the first transaction selects 10 rows, but updates 5 to
"processing" then after it commits the second transaction only returns the 5
rows that are still "ready".    Seems that the second SELECT .. FOR UPDATE
grabs the rows and then blocks.  Then after the first commit tests the WHERE
again.

Anyway, is there a way to do this other than a full table lock?

-- 
Bill Moseley
moseley at hank.org


More information about the thelist mailing list