问题描述
更新:下面的潜在解决方案
Update: Potential solution below
我有大量的配置文件,其中包含键/值我正尝试推送到数据库中的数据对。在配置文件中重复了很多键和值,因此我使用3个表存储数据。一个用于所有唯一键值,一个用于所有唯一对值,一个列出每个文件的所有键/值对。
I have a large corpus of configuration files consisting of key/value pairs that I'm trying to push into a database. A lot of the keys and values are repeated across configuration files so I'm storing the data using 3 tables. One for all unique key values, one for all unique pair values, and one listing all the key/value pairs for each file.
问题:
我正在使用多个并发进程(并因此使用连接)将原始数据添加到数据库中。不幸的是,在尝试将值添加到键表和值表时,我发现了很多死锁。我尝试了几种不同的插入数据的方法(如下所示),但始终会遇到检测到死锁错误
Problem:I'm using multiple concurrent processes (and therefore connections) to add the raw data into the database. Unfortunately I get a lot of detected deadlocks when trying to add values to the key and value tables. I have a tried a few different methods of inserting the data (shown below), but always end up with a "deadlock detected" error
我想知道是否有人可以确切地说明造成这些僵局的原因,可能会指出一些解决问题的方法。查看我正在使用的SQL语句(在下面列出),我真的看不出为什么存在任何相互依赖关系。
感谢阅读!
I was wondering if someone could shed some light on exactly what could be causing these deadlocks, and possibly point me towards some way of fixing the issue. Looking at the SQL statements I'm using (listed below), I don't really see why there is any co-dependency at all.
Thanks for reading!
示例配置文件:
example_key this_is_the_value
other_example other_value
third example yet_another_value
表定义:
CREATE TABLE keys (
id SERIAL PRIMARY KEY,
hash UUID UNIQUE NOT NULL,
key TEXT);
CREATE TABLE values (
id SERIAL PRIMARY KEY,
hash UUID UNIQUE NOT NULL,
key TEXT);
CREATE TABLE keyvalue_pairs (
id SERIAL PRIMARY KEY,
file_id INTEGER REFERENCES filenames,
key_id INTEGER REFERENCES keys,
value_id INTEGER REFERENCES values);
SQL语句:
最初,我试图使用此语句来避免任何异常:
Initially I was trying to use this statement to avoid any exceptions:
WITH s AS (
SELECT id, hash, key FROM keys
WHERE hash = 'hash_value';
), i AS (
INSERT INTO keys (hash, key)
SELECT 'hash_value', 'key_value'
WHERE NOT EXISTS (SELECT 1 FROM s)
returning id, hash, key
)
SELECT id, hash, key FROM i
UNION ALL
SELECT id, hash, key FROM s;
但是即使如此简单的操作也会导致死锁:
But even something as simple as this causes the deadlocks:
INSERT INTO keys (hash, key)
VALUES ('hash_value', 'key_value')
RETURNING id;
- 在两种情况下,如果由于插入的哈希而引发异常,
的值不是唯一的,我使用保存点来回滚更改,而
则使用另一条语句来选择我要使用的ID。 - 我正在使用哈希唯一字段,因为某些键和值
太长而无法索引
完整示例带有保存点的python代码(使用psycopg2):
key_value = 'this_key'
hash_val = generate_uuid(value)
try:
cursor.execute(
'''
SAVEPOINT duplicate_hash_savepoint;
INSERT INTO keys (hash, key)
VALUES (%s, %s)
RETURNING id;
'''
(hash_val, key_value)
)
result = cursor.fetchone()[0]
cursor.execute('''RELEASE SAVEPOINT duplicate_hash_savepoint''')
return result
except psycopg2.IntegrityError as e:
cursor.execute(
'''
ROLLBACK TO SAVEPOINT duplicate_hash_savepoint;
'''
)
#TODO: Should ensure that values match and this isn't just
#a hash collision
cursor.execute(
'''
SELECT id FROM keys WHERE hash=%s LIMIT 1;
'''
(hash_val,)
)
return cursor.fetchone()[0]
更新:
因此,我相信我对
具体来说:
虽然我仍然不确定确切的相互依赖关系,但似乎在不提交的情况下处理大量键/值对可能会导致这样的事情。果然,如果我在添加每个单独的配置文件后提交,则不会发生死锁。
While I'm still not exactly sure where the co-dependency is, it seems that processing a large number of key/value pairs without commiting would likely result in something like this. Sure enough, if I commit after each individual configuration file is added, the deadlocks don't occur.
推荐答案
看起来像您在这种情况下:
It looks like you're in this situation:
- 要插入的表具有主键(或任何类型的唯一索引)。 / li>
- 对该表的几个INSERT操作是在一个事务中执行的(而不是在每个事务之后立即提交)
- 要插入的行按随机顺序排列(关于主键)
- 将行插入并发事务中。
此这种情况会为僵局带来以下机会:
This situation creates the following opportunity for deadlock:
假定有两个会话,每个会话都开始了一个事务。
Assuming there are two sessions, that each started a transaction.
- 会话#1:插入具有PK'A'的行
- 会话#2:插入具有PK'B'的行
- 会话1:尝试插入带有PK'B'
的行=>将会话1置于等待会话2提交或rollbac的状态ks - 会话#2:尝试插入带有PK'A'的行
=>将会话#2置于等待会话#1的位置。
- Session #1: insert row with PK 'A'
- Session #2: insert row with PK 'B'
- Session #1: try to insert row with PK 'B'=> Session #1 is put to wait until Session #2 commits or rollbacks
- Session #2: try to insert row with PK 'A'=> Session #2 is put to wait for Session #1.
此后不久,死锁检测器意识到两个会话现在都在互相等待,并终止了其中一个致命的死锁错误。
Shortly thereafter, the deadlock detector gets aware that both sessions are now waiting for each other, and terminates one of them with a fatal deadlock detected error.
在这种情况下,最简单的解决方案是在插入新条目之后尝试将任何新行插入表中之前提交COMMIT。
If you're in this scenario, the simplest solution is to COMMIT after a new entry is inserted, before attempting to insert any new row into the table.
这篇关于Postgres 9.3:使用简单INSERT的Sharelock问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!