[thelist] Queue processing with a database table.

Mon Aug 30 12:09:43 CDT 2010

On Sun, Aug 29, 2010 at 9:04 AM, Bill Moseley <moseley at hank.org> wrote:
> It's possible that a process might die, so the "start_time" is used to clear
> up stale "processing" states.
>
> My first pass was SELECT ... FOR UPDATE, but (at least in Postgresql) what
> happens is if two processes do the "SELECT * from queue WHERE state =
> 'ready' limit 10 FOR UPDATE" then when the first COMMIT happens the second
> transactions ends up returning zero rows.  Or to be clear, say there's 1000s
> of rows, if the first transaction selects 10 rows, but updates 5 to
> "processing" then after it commits the second transaction only returns the 5
> rows that are still "ready".    Seems that the second SELECT .. FOR UPDATE
> grabs the rows and then blocks.  Then after the first commit tests the WHERE
> again.
>
> Anyway, is there a way to do this other than a full table lock?

If you really only want to utilize the database for this (as opposed
to creating a singleton as Hassan suggested), I would create a second
table called QUEUE_CLAIM where your processing machines insert rows to
claim individual rows in the queue table. Create a constraint that
enforces the uniqueness you want on QUEUE_ID (or whatever your queue's
PK is). The database will take care of the rest.

This suggestion only works because you are dealing with relatively low
chunk sizes (10 rows).

I suspect your database would actually manage the concurrency scenario
you put forth, but that gets deeper into implementation details than I
feel comfortable. The above suggestion aims to avoid those details,
and as a nice bonus will work across database platforms.

-- 
Matt Warden
Cincinnati, OH, USA
http://mattwarden.com

This email proudly and graciously contributes to entropy.