问题描述
我们在数据库(PostgreSQL)上有以下两个重复且独立运行的任务:
We have the following two repeatedly and independently running tasks on our database (PostgreSQL):
会话1在事务中进行了一些更新并设置了更新后的时间戳数据集:
Session 1 does some updates in a transaction and set the timestamp of the updated datasets:
BEGIN;
...
UPDATE table SET ..., timestamp = current_timestamp WHERE ...;
... // (A)
COMMIT;
会话2选择自上次运行以来已更新的所有数据集:
Session 2 selects all datasets that were updated since its last run:
SELECT * FROM table WHERE timestamp BETWEEN last_run AND current_timestamp;
last_run = current_timestamp;
...
如果会话2在会话1处于(A)时开始,之所以看不到更改,是因为直到提交后才会设置时间戳,而是设置为更早的值。
此外,以后的会话2将不会选择更改,因为last_run已经大于时间戳。
因此,问题是会话1在错误的时间将时间戳分别设置为错误的值,因此更改可能被忘记。
If session 2 starts while session 1 is at (A) it will not see the changes because the timestamp will not be set until the commit but to an earlier value.Furthermore no subsequent session 2 will select the changes because last_run will already be greater than the timestamp.So the problem is that session 1 sets the timestamp to a wrong value respectively at a wrong time and thus changes may be "forgotten".
可能的解决方法将在会话1的另一个表中存储更新的数据集ID,然后在会话2的该表中选择并删除它们。
但是也许有人有更好的主意...
A possible workaround would be to store the updated datasets ids in another table in session 1 and select and delete them from this table in session 2.But perhaps someone has a better idea...
推荐答案
有时会出现-据我所知,唯一完全可靠的方法是执行您所描述的操作,将更新的ID存储在某个表中在第一个过程中将其标记为在第二个过程中已处理。基本上,这是在数据库中重塑消息队列。您已经很好地描述了天真的解决方案将如何错过更新。
This sort of question comes up from time to time-- as far as I can tell, the only completely reliable way is to do what you've described, store updated IDs in some table in the first process and mark them as processed in the second. Basically this is reinventing a message-queue in the database. You've described quite well how a naive solution will miss updates.
具有导入过程标记的更新行可以非常轻松地完成,甚至可以使用数据触发器来实现表。如果您只有一个使用者流程,那么它要做的就是 delete返回item_id 以获取更新列表。听起来好像要复杂得多,但恕我直言,事实并非如此。诸如能够免费监控未完成订单量的功能。
Having the import process mark updated rows can be done quite easily, or even implemented using triggers on your data table. If you only have one consumer process, then all it has to do is delete from updated_item returning item_id
to get a list of updates. It sounds like it's a lot more complicated but IMHO it isn't, really. Features like being able to monitor how big the backlog is emerge for free, too.
这篇关于在事务内部设置时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!