使用包含具有空值的列的

使用包含具有空值的列的

本文介绍了使用包含具有空值的列的 WHERE 子句的更新语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用另一个表中的数据更新一个表上的列.WHERE 子句基于多个列,并且某些列为空.根据我的想法,这些空值是 抛出您的标准 UPDATE TABLE SET X=Y WHERE A=B 语句的原因.

I am updating a column on one table using data from another table. The WHERE clause is based on multiple columns and some of the columns are null. From my thinking, this nulls are what are throwing off your standard UPDATE TABLE SET X=Y WHERE A=B statement.

请参阅 此 SQL Fiddle 的两个表,其中我正在尝试更新 table_one 基于来自 table_two 的数据.我的查询目前如下所示:

See this SQL Fiddle of the two tables where am trying to update table_one based on data from table_two.My query currently looks like this:

UPDATE table_one SET table_one.x = table_two.y
FROM table_two
WHERE
table_one.invoice_number = table_two.invoice_number AND
table_one.submitted_by = table_two.submitted_by AND
table_one.passport_number = table_two.passport_number AND
table_one.driving_license_number = table_two.driving_license_number AND
table_one.national_id_number = table_two.national_id_number AND
table_one.tax_pin_identification_number = table_two.tax_pin_identification_number AND
table_one.vat_number = table_two.vat_number AND
table_one.ggcg_number = table_two.ggcg_number AND
table_one.national_association_number = table_two.national_association_number

当任一表中的任何列为 null 时,对于未更新 table_one.x 中的某些行的查询失败.即只有当所有列都有一些数据时才会更新.

The query fails for some rows in that table_one.x isn't getting updated when any of the columns in either table are null. i.e. it only gets updated when all columns have some data.

这个问题与我之前的一个问题有关区别开.我现在想要的是使用具有唯一字段的表中的值填充大型数据集.

This question is related to my earlier one here on SO where I was getting distinct values from a large data set using Distinct On. What I now I want is to populate the large data set with a value from the table which has unique fields.

更新

我使用了@binotenary 提供的第一个更新语句.对于小桌子,它会瞬间运行.示例有一个包含 20,000 条记录的表,并且更新在 20 秒内完成.但是到目前为止,另一个包含 900 万多条记录的表已经运行了 20 小时!见下面 EXPLAIN 函数的输出

I used the first update statement provided by @binotenary. For small tables, it runs in a flash. Example is had one table with 20,000 records and the update was completed in like 20 seconds. But another table with 9 million plus records has been running for 20 hrs so far!. See below the output for EXPLAIN function

Update on table_one  (cost=0.00..210634237338.87 rows=13615011125 width=1996)
  ->  Nested Loop  (cost=0.00..210634237338.87 rows=13615011125 width=1996)
    Join Filter: ((((my_update_statement_here))))
    ->  Seq Scan on table_one  (cost=0.00..610872.62 rows=9661262 width=1986)
    ->  Seq Scan on table_two  (cost=0.00..6051.98 rows=299998 width=148)

EXPLAIN ANALYZE 选项也一直占用,所以我取消了它.

The EXPLAIN ANALYZE option took also forever so I canceled it.

关于如何使此类更新更快的任何想法?即使这意味着使用不同的更新语句,甚至使用自定义函数来循环并执行更新.

Any ideas on how to make this type of update faster? Even if it means using a different update statement or even using a custom function to loop through and do the update.

推荐答案

由于 null = null 计算结果为 false 你需要检查两个字段是否都是 null 除了相等性检查:

Since null = null evaluates to false you need to check if two fields are both null in addition to equality check:

UPDATE table_one SET table_one.x = table_two.y
FROM table_two
WHERE
    (table_one.invoice_number = table_two.invoice_number
        OR (table_one.invoice_number is null AND table_two.invoice_number is null))
    AND
    (table_one.submitted_by = table_two.submitted_by
        OR (table_one.submitted_by is null AND table_two.submitted_by is null))
    AND
    -- etc

您也可以使用 coalesce 更易读的函数:

You could also use the coalesce function which is more readable:

UPDATE table_one SET table_one.x = table_two.y
FROM table_two
WHERE
    coalesce(table_one.invoice_number, '') = coalesce(table_two.invoice_number, '')
    AND coalesce(table_one.submitted_by, '') = coalesce(table_two.submitted_by, '')
    AND -- etc

但您需要注意默认值(coalesce 的最后一个参数).
它的数据类型应该与列类型匹配(例如,这样您就不会最终将日期与数字进行比较),并且默认值应该不会出现在数据中
例如 coalesce(null, 1) = coalesce(1, 1) 是您想要避免的情况.

But you need to be careful about the default values (last argument to coalesce).
It's data type should match the column type (so that you don't end up comparing dates with numbers for example) and the default should be such that it doesn't appear in the data
E.g coalesce(null, 1) = coalesce(1, 1) is a situation you'd want to avoid.

Seq Scan on table_two - 这表明您在 table_two 上没有任何索引.
因此,如果您更新 table_one 中的一行,然后要在 table_two 中找到匹配的行,数据库基本上必须逐一扫描所有行,直到找到匹配项.
如果对相关列进行索引,则可以更快地找到匹配的行.

Seq Scan on table_two - this suggests that you don't have any indexes on table_two.
So if you update a row in table_one then to find a matching row in table_two the database basically has to scan through all the rows one by one until it finds a match.
The matching rows could be found much faster if the relevant columns were indexed.

另一方面,如果 table_one 有任何索引,则会减慢更新速度.
根据本性能指南:

On the flipside if table_one has any indexes then that slows down the update.
According to this performance guide:

表约束和索引会严重延迟每次写入.如果可能,您应该在更新运行时删除所有索引、触发器和外键,并在最后重新创建它们.

同一指南中的另一个可能有用的建议是:

Another suggestion from the same guide that might be helpful is:

如果您可以使用例如顺序 ID 对数据进行分段,则可以批量更新行.

例如,如果 table_one 是一个 id 列,您可以添加类似

So for example if table_one an id column you could add something like

and table_one.id between x and y

where 条件并运行查询多次更改 xy 的值,以便覆盖所有行.

to the where condition and run the query several times changing the values of x and y so that all rows are covered.

EXPLAIN ANALYZE 选项也需要很长时间

在处理带有副作用的语句时,您可能需要小心使用带有 EXPLAINANALYZE 选项.根据文档:

You might want to be careful when using the ANALYZE option with EXPLAIN when dealing with statements with sideffects.According to documentation:

请记住,当使用 ANALYZE 选项时,该语句实际上是执行的.尽管 EXPLAIN 将丢弃 SELECT 将返回的任何输出,但该语句的其他副作用将照常发生.

这篇关于使用包含具有空值的列的 WHERE 子句的更新语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-26 08:52