从BigQuery表中删除重复的行 | 从BigQuery表中删除重复的行

本文介绍了从BigQuery表中删除重复的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含> 1M行数据和20列以上的表格。在我的表格（tableX）中，我确定了重复记录（〜80k）如果可能的话，我想保留原始表名并从有问题的列中删除重复记录，否则我可以创建一个新表（tableXfinal ）具有相同的模式，但没有重复。

我不熟悉SQL或任何其他编程语言，所以请原谅我的无知。

 从Accidents.CleanedFilledCombined中删除
其中Fixed_Accident_Index 
中（从Accidents.CleanedFilledCombined中选择Fixed_Accident_Index 
组由Fixed_Accident_Index 
计数（Fixed_Accident_Index）> 1）;

解决方案

您可以通过运行重写您的查询来删除重复项表（您可以使用与目的地相同的表格，或者可以创建一个新表格，确认它具有您想要的内容，然后将其复制到旧表格中）。

$ b

  SELECT * 
 FROM（ 
 SELECT 
 *，
 ROW_NUMBER（）
 OVER（PARTITION BY Fixed_Accident_Index）
 row_number 
 FROM Accidents.CleanedFilledCombined 
）
 WHERE row_number = 1

I have a table with >1M rows of data and 20+ columns.

Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).

If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.

I am not proficient in SQL or any other programming language so please excuse my ignorance.

delete from Accidents.CleanedFilledCombined
where Fixed_Accident_Index
in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined
group by Fixed_Accident_Index
having count(Fixed_Accident_Index) >1);

解决方案

You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).

A query that should work is here:

SELECT *
FROM (
  SELECT
      *,
      ROW_NUMBER()
          OVER (PARTITION BY Fixed_Accident_Index)
          row_number
  FROM Accidents.CleanedFilledCombined
)
WHERE row_number = 1

这篇关于从BigQuery表中删除重复的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！