如何在并发访问时标记表中的

如何在并发访问时标记表中的

本文介绍了如何在并发访问时标记表中的某些行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的应用程序有一个名为 cargo_items 的表.它可以被看作是一种稍后处理这些项目的队列.最初有一个 SINGLE 作业,它需要 3000 个条目并一个接一个地处理它们.后来,有人决定启动同一工作的其他 3 个实例.发生的事情很明显,许多项目被处理了两次.

Our application has a table called cargo_items. It can be seen as kind of a queue to later process these items. Initially there was a SINGLE job which took 3000 entries and processed them one after another. Later on, somebody decided to start 3 other instances of the same job. What happened is quite obvious, many of the items were processed twice.

如果同时运行许多实例,我的工作是使这些进程正常工作.我现在要做的解决方案是用 job_id 标记数据库中的 3000 个条目,然后获取所有这些实体并与其他进程隔离处理它们.

My job is to make these processes working correctly if at the same time many instances are running. The solution I am going for now is to mark 3000 entries in the database with a job_id and later on fetch all of these entities and process them isolated from the other processes.

我目前标记此行的方法如下:

My current approach for flagging this rows is the following:

UPDATE cargo_item item
SET job_id = 'SOME_UUID', job_ts = now()
FROM  (
   SELECT id
   FROM   cargo_item
   WHERE  state='NEW' AND job_id is null
   LIMIT  3000
   FOR UPDATE
   ) sub
WHERE  item.id = sub.id;

基本上,这种方法会锁定 3000 行以进行更新.我不确定这是否是一个好方法.

Basically this approach locks the 3000 rows for a update. I am not sure if it is a good approach though.

在另一个线程上,我读到了有关在这种情况下使用咨询锁的信息.

On another thread I read about using advisory locks for this scenario.

你们如何看待当前的方法并改用咨询锁?

What do you guys think about the current approach and using advisory lock instead?

按照建议,我会像这样调整更新语句:

As suggested, I'd adapt the update statement like this:

UPDATE cargo_item item
SET job_id = 'SOME_UUID', job_ts = now()
FROM  (
   SELECT id
   FROM   cargo_item
   WHERE  state='NEW' AND job_id is null
   ORDER  BY id
   LIMIT  3000
   FOR UPDATE
   ) sub
WHERE  item.id = sub.id;

感谢 Erwin 和 Tometzky 的提示.不过,我想知道我试图解决问题的方法是否好?你会想到不同的方法吗?

Thx Erwin and Tometzky for the hint. Nevertheless I wonder if the way I am trying to solve the problem is a good one? Are there different approaches you'd think of?

推荐答案

在你所指的相关答案中:

In the related answer you are referring to:

目标是一次锁定 一个 行.无论是否使用咨询锁,这都可以正常工作,因为不会出现死锁 - 只要您不尝试在同一事务中锁定更多行.

The objective is to lock one row at a time. This works fine with or without advisory locks, because there is no chance for a deadlock - as long as you don't try to lock more rows in the same transaction.

您的示例与您的示例不同,您希望一次锁定 3000 行.有可能发生死锁,除非所有并发写入操作以相同的一致顺序锁定行.每个文档:

Your example is different in that you want to lock 3000 rows at a time. There is potential for deadlock, except if all concurrent write operations lock rows in the same consistent order. Per documentation:

防止死锁的最佳方法通常是通过以下方式避免死锁确保所有使用数据库的应用程序都获得锁定多个对象以一致的顺序.

在您的子查询中使用 ORDER BY 来实现.

Implement that with an ORDER BY in your subquery.

UPDATE cargo_item item
SET job_id = 'SOME_UUID', job_ts = now()
FROM  (
   SELECT id
   FROM   cargo_item
   WHERE  state='NEW' AND job_id is null
   ORDER  BY id
   LIMIT  3000
   FOR UPDATE
   ) sub
WHERE  item.id = sub.id;

这是安全可靠的,只要所有事务以相同的顺序获取锁并且不期望排序列的并发更新.(阅读黄色的注意"框在手册本章的末尾.)所以这对你来说应该是安全的,因为你不会更新 id 列.

This is safe and reliable, as long as all transactions acquire locks in the same order and concurrent updates of the ordering columns are not to be expected. (Read the yellow "CAUTION" box at the end of this chapter in the manual.) So this should be safe in your case, since you are not going to update the id column.

实际上一次只有一个客户端可以以这种方式操作行.并发事务会尝试锁定相同(锁定)的行并等待第一个事务完成.

Effectively only one client at a time can manipulate rows this way. Concurrent transactions would try to lock the same (locked) rows and wait for the first transaction to finish.

咨询锁很有用.只有少数几个,总体而言,仅使用上述查询并让并发事务等待轮到它们会更便宜.

Advisory locks are useful if you have many or very long running concurrent transactions (doesn't seem you do). With only a few, it will be cheaper overall to just use above query and have concurrent transactions wait for their turn.

在您的设置中,似乎并发访问本身不是问题.并发是您当前解决方案造成的问题.

It seems concurrent access isn't a problem per se in your setup. Concurrency is an issue created by your current solution.

相反,在单个 UPDATE 中完成所有操作.将成批的 n 数字(在示例中为 3000)分配给每个 UUID 并一次全部更新.应该是最快的.

Instead, do it all in a single UPDATE. Assign batches of n numbers (3000 in the example) to each UUID and update all at once. Should be fastest.

UPDATE cargo_item c
SET    job_id = u.uuid_col
     , job_ts = now()
FROM  (
   SELECT row_number() OVER () AS rn, uuid_col
   FROM   uuid_tbl WHERE  <some_criteria>  -- or see below
   ) u
JOIN (
   SELECT (row_number() OVER () / 3000) + 1 AS rn, item.id
   FROM   cargo_item
   WHERE  state = 'NEW' AND job_id IS NULL
   FOR    UPDATE   -- just to be sure
   ) c2 USING (rn)
WHERE  c2.item_id = c.item_id;

要点

  • 整数除法被截断.前 3000 行获得 1,接下来的 3000 行获得 2.等等.

    Major points

    • Integer division truncates. You get 1 for the first 3000 rows, 2 for the next 3000 rows. etc.

      我随意选择行,你可以在窗口中应用 ORDER BYrow_number() 分配某些行.

      I pick rows arbitrarily, you could apply ORDER BY in the window for row_number() to assign certain rows.

      如果您没有要分派的 UUID 表 (uuid_tbl),请使用 VALUES 表达式来提供它们.示例.

      If you don't have a table of UUIDs to dispatch (uuid_tbl), use a VALUES expression to supply them. Example.

      您会获得 3000 行的批次.如果您找不到要分配的 3000 的倍数,最后一批将缺少 3000.

      You get batches of 3000 rows. The last batch will be short of 3000 if you don't find a multiple of 3000 to assign.

      这篇关于如何在并发访问时标记表中的某些行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 06:11