本文介绍了更新 WHERE 值在具有 GROUP BY 的子查询中的方法,因此没有竞争条件问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

也许这是我的天真,也许是我的偏执狂,但我认为我正在寻找一种解决竞争条件问题的方法,它似乎应该如此普遍,以至于会有大量的解决方案,而我已经找到了现在一个...但我还没有.

Perhaps it's my naiveté, perhaps my paranoia, but I think I'm looking for a solution to a race-condition issue that seems like it should be so common there'd be floods of solutions and I'd've found one by now... but I haven't.

最简单的场景是我有一个进程应该抓取任何记录,其中有多个特定类型.我想让系统/进程线程-/多处理-/可重入-/当今流行语安全;如果相同的过程开始并引入了试图抓住感兴趣行的竞争条件,我希望有明确的赢家/输家:一个人成功,另一个人出错;实际上,我更喜欢第二次的无缝、沉默、优雅的失败",因为它只是看不到那些会被第一个实例抓住的东西.

The simplistic scenario is I have a process that's supposed to grab any records where there are more than one of a certain type. I'd like to make the system/process(es) thread-/multiprocessing-/reentrant-/buzzword-of-the-day-safe; if the same process gets started and introduces a race condition trying to grab rows of interest, I'd like there to be clear-cut winners/losers: success for one, error for the other; actually, I'd prefer a seamless, silent, graceful "failure" for the second in that it'd merely NOT SEE those that would have been grabbed by the first instance.

这就是我的困境.

我的查询是这样的:

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this worker
    WHERE trans_nbr IN (
                          SELECT trans_nbr
                            FROM my_table
                        GROUP BY trans_nbr
                          HAVING COUNT(trans_nbr) > 1
                           LIMIT our_limit_to_have_single_process_grab
                       )
RETURNING row_id

我的想法是:我认为没有锁,所以不能保证子查询和外部更新之间的状态".那么,如何确保任何候选THIS 进程获得,我们抓取,同时它们没有被另一个进程抓取?

My thought is: I would think that there's no lock, so no guarantee of "state" between the subquery and the outer update. So, how to ensure that any candidates THIS process gets, we grab, and they haven't been grabbed by another process in the mean time?

我考虑过添加FOR UPDATE ON my_table" 在子查询的末尾,但这不起作用;不能有这个和GROUP BY"(这是计算 trans_nbr 的 COUNT 所必需的).(因为这将强制阻止任何同样运行的等待我们的更新,这将是一个首选的解决方案,因为这样它既可以避免竞争条件导致的错误 [两个进程抓取同一行{s}] 并允许那些其他进程完全不知道,只是得到不再包括第一个进程抓取的行.唉.)

I've thought about adding a "FOR UPDATE ON my_table" at the end of the subquery, but that won't work; can't have this AND a "GROUP BY" (which is necessary for figuring out the COUNT of trans_nbr's). (As this would force any also-rans to be blocked pending our update, this would have been a preferred solution, as then it would both avoid a race-condition-caused error [two processes grabbing the same row{s}] and allow those other processes to be blissfully unawares and merely get rows that no longer include those the first process grabbed. Alas.)

我曾考虑过锁定表,但(至少在 Postgres 中)表锁仅在 COMMIT 后释放;出于测试目的,我不想提交,因此在测试期间(是的,在测试数据库测试之后对实时数据库进行实时测试)它不会走这条路.(另外,即使是现场直播,考虑到足够多的用户/进程,这也会对性能造成不可接受的影响.)

I've thought about locking the table, but (in Postgres, at least) table locks are only released after a COMMIT; for testing purposes, I don't want to COMMIT, therefore during testing (yes, the pre-live testing on the live database AFTER testing on a test db) it wouldn't do to go this route. (Plus, even live, this would give an unacceptable performance hit given enough users/processes.)

我考虑过根据子查询的 processing_by 值来进行更新,但是,这又是行不通的:如果 in 子查询,会破坏 GROUP BY/HAVING 条件(因为现在将计算 trans_nbr/processing_by 的子组,这不是我想要的).

I've thought about making the update dependent upon what the processing_by value was for our subquery, but, again, that won't work: if in the subquery, would break the GROUP BY/HAVING condition (as now there would be sub-groups of trans_nbr/processing_by being counted, which isn't what I'm after).

我期待正确方向中的某个尖锐点嘲笑我提出如此明显的问题,但这对我来说并不明显(显然;o)我向你保证,我已经研究了几个小时.

I'm expecting some shrill point in the Right Direction mocking me asking such an obvious question, but it's not been obvious to me (obviously ;o) and I assure you, I have been researching this for literally hours.

非常感谢任何提示,更不用说解决方案了!

Thanks, much, for any hints, let alone solutions!

更新:非常感谢Chris Travers

想起了关于Forrest for the Trees"的那句老话!:>

That ol' line about "Forrest for the Trees" comes to mind! :>

这里是查询的修改版本,考虑了这个建议,并添加了另一个双重检查".这应该是唯一的.

Here's a modified version of the query, taking this suggestion into account, and adding another "double check". This should be THE ONE.

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this worker
    WHERE trans_nbr IN (
                SELECT trans_nbr
                  FROM my_table
                 WHERE trans_nbr IN (
                           SELECT trans_nbr
                             FROM my_table
                         GROUP BY trans_nbr
                           HAVING COUNT(*) > 1 -- Thanks for the suggestion, Flimzy
                            LIMIT our_limit_to_have_single_process_grab
                                    )
                   AND processing_by IS NULL
                       /* Or some other logic that says "not currently being
                          processed".  This way, we ALSO verify we're not
                          grabbing one that might have been UPDATEd/grabbed
                          during our sub-SELECT, while it was being
                          blocked/waiting.

                          This COULD go in our UPDATE/top-level, but unnecessary
                          rows could be locked by this lower-level in that case.
                       */
            FOR UPDATE /* Will block/wait for rows this finds to be unlocked by
                          any prior transaction that had a lock on them.

                          NOTE: Which _could_ allow the prior trans to change
                                our desired rows in the mean time, thus the
                                secondary WHERE clause.
                       */
                       )
RETURNING row_id

我希望 Postgres 具有类似 SKIP LOCKED 的功能.特别是对于需要在不阻塞其他处理的情况下处理的本质上原子行的队列.但是唉. 也许有一天...? 还是很快"? :-)

I would love for Postgres to have a SKIP LOCKED-like feature. Especially for queues of essentially atomic rows that need to be processed without blocking other processing. But alas. Maybe someday...? Or "soon"? :-)

现在,可以添加 NOWAIT 不会被任何其他事务阻止,但是请记住,它只会以错误的形式返回 - 您必须继续尝试查询,直到它成功(或放弃).如果没有NOWAIT,查询会阻塞,直到其他事务释放它们的锁,或者查询超时.

For now, one may add NOWAIT to NOT get blocked by any other transaction(s), however keep in mind, it merely dumps back with an error - you'll have to keep trying your query until it succeeds (or give up). Without NOWAIT, the query blocks until other transactions release their locks, or the query times out.

更新 2: 所以,在重新阅读并思考之后,再次Forrest for the Trees"的时刻.我可以这样做:

UPDATE 2: SO, after re-re-re-reading this and thinking about it, again "Forrest for the Trees" moment. I can simply do like this:

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this worker
    WHERE trans_nbr IN (
                        -- This query MAY pull ones we don't want to mess with (already "grabbed")
                          SELECT trans_nbr
                            FROM my_table
                        GROUP BY trans_nbr
                          HAVING COUNT(*) > 1
                           LIMIT our_limit_to_have_single_process_grab
                             AND processing_by IS NULL -- only "ungrabbed" ones (at this point)
                       )
      AND processing_by IS NULL -- But THIS will drop out any "bogus" ones that changed between subquery and here
RETURNING row_id

提交事务以释放我们的锁,以及 Bob 的你叔叔.

COMMIT the transaction to release OUR locks, and Bob's yer uncle.

不过,SKIP LOCKED 仍然非常酷.

SKIP LOCKED would still be super-cool, though.

警告:如果要让工人拉动有限(例如 LIMIT 1)行数和/或物品,则必须按特定顺序(例如:FIFO、ORDER BY 和/或通过 Min(id)) 之类的函数,可能会出现工人饥饿的情况:工人等待和等待,当他们等待的行解除阻塞时,结果没有一个符合其最终标准.有很多方法可以解决这个问题,比如让工人通过 OFFSET 跳来跳去,但大多数方法要么复杂,要么速度慢.(通常两者都有.奖金!)

A CAVEATE: If one was to have workers pulling a limited (like LIMIT 1) number of rows and/or items must be grabbed in a certain order (e.g.: FIFO, either ORDER BY and/or by function like Min(id)), there can be cases of starved workers: a worker waits and waits, and when the row(s) they were waiting for unblocks, turns out none of them meet its final criteria. There are a number of ways to try to get around this, like having workers jumping around via OFFSET, but most are either complex or slow. (Usually both. BONUS!)

MY functionailty 期望返回多行,或者没有行是 A-OK - 暂时无事可做;睡一会再检查,所以这对我来说不是问题.它可能适合你.如果是这样,您将需要考虑...

MY functionailty expects multiple rows returned, or none is A-OK - nothing to do for now; sleep for a bit and recheck, so this isn't a problem for me. It may be for you. If so, you'll want to consider a...

非阻塞版本:我找到了一个 伟大的文章 解决了这个问题,结果证明,它向我介绍了 Pg 的 咨询锁.(这个也提供了很多信息.)

NON-BLOCKING VERSION: I found a great article working with this very problem, turns out, and it introduced me to Pg's Advisory Locks. (This one was quite informative, too.)

所以,我自己的问题的非阻塞解决方案应该如下所示:

So, a non-blocking solution to my own problem should look like this:

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this worker
    WHERE trans_nbr IN (
            -- This query MAY pull ones we don't want to mess with (already "grabbed")
              SELECT trans_nbr
                FROM my_table AS inner_my_table_1
            GROUP BY trans_nbr
              HAVING Count(*) > 1
                 AND Count(*) in ( -- For MY query, since I'm grouping-by, I want "all or none" of trans_nbr rows
                       SELECT Count(*)
                         FROM my_table AS inner_my_table_2
                        WHERE inner_my_table_2.trans_nbr = inner_my_table_1.trans_nbr
                          AND pg_try_advisory_xact_lock(id) -- INT that will uniquely ID this row
                                 )
/* Note also that this will still lock all non-locked rows with this
   trans_nbr, even though we won't use them unless we can grab ALL of the
   rows with same trans_nbr... the rest of our query should be made
   quick-enough to accept this reality and not tie up the server unduly.

   See linked info for more-simple queries not doing group-by's.
*/
               LIMIT our_limit_to_have_single_process_grab
                 AND processing_by IS NULL -- only "ungrabbed" ones (at this point)
                       )
      AND processing_by IS NULL -- But THIS will drop out any "bogus" ones that changed between subquery and here
RETURNING row_id

注意事项:

  • 由应用程序执行/遵守咨询锁,因此这不是灵丹妙药,也不是安慰剂.同样,SKIP LOCKED 也因此非常方便.
  • pg_try_advisory_lock,从 v 8.2 开始,不自动-unlock,(因此)可能(必须)显式解锁
  • pg_try_advisory_xact_lock,自 v 9.1 起,自动解锁在交易结束时,可能不会显式解锁
  • 我还没有测试过!我会在我有测试的时候编辑/更新...
  • It's up to the applications to do/respect Advisory Locks, so this is no pancea, but nor is it a placebo. Again, SKIP LOCKED would be very handy because of this.
  • pg_try_advisory_lock, since v 8.2, does not auto-unlock, (thus) may (MUST) be explicitly unlocked
  • pg_try_advisory_xact_lock, since v 9.1, auto-unlocks at end of transaction, may NOT be explicitly unlocked
  • I HAVE NOT TESTED THIS YET! I'll edit/update when I have...

推荐答案

加一个子查询层来加锁怎么样?

How about an extra subquery layer for the locking?

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this instance
    WHERE trans_nbr IN (
                    SELECT trans_nbr
                      FROM my_table
                     WHERE trans_nbr IN (
                                 SELECT trans_nbr
                                   FROM my_table
                               GROUP BY trans_nbr
                                 HAVING COUNT(trans_nbr) > 1
                                  LIMIT our_limit_to_have_single_process_grab
                                 )
                        FOR UPDATE
                       )
RETURNING row_id

这篇关于更新 WHERE 值在具有 GROUP BY 的子查询中的方法,因此没有竞争条件问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-28 19:02
查看更多