本文介绍了如何将行转换为BigQuery / SQL中包含大量数据的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在将BigQuery中的大量数据表(15亿行)从行转换为列时出现问题。我可以弄清楚如何在硬编码时用少量的数据做到这一点,但是数量很大。表格的快照如下所示:



+ ------------------- ------- +
| CustomerID特征值|
+ -------------------------- +
| 1 A123 3 |
| 1 F213 7 |
| 1 F231 8 |
| 1 B789 9.1 |
| 2 A123 4 |
| 2 U123 4 |
| 2 B789 12 |
| .. .. ..
| .. .. ..
| 400000 A123 8 |
| 400000 U123 7 |
| 400000 R231 6 |
+ -------------------------- +


$ b $因此,基本上大约有400,000个具有3000个特征的不同customerID,并不是每个customerID具有相同的特征,所以一些customerID可能有2000个特征,而有些特征有3000个特征。我希望得到的最终结果表是每行呈现一个不同的customerID,并具有3000列,提供所有功能。像这样:



CustomerID Feature1 Feature2 ... Feature3000



所以一些单元格可能缺少值。

任何人都知道如何在BigQuery或SQL中做到这一点?



预先感谢。

解决方案
 步骤#1 

在下面的查询中,将 yourTable 替换为表的真实名称并执行/运行它

  SELECT'SELECT CustomerID,'+ 
GROUP_CONCAT_UNQUOTED(
'MAX(IF(Feature =''+ STRING(Feature) $'$'$'$'$'$'$'$'$'$'$'$'$'$'$'$'$'$'$'$'$'$' c>

因此,您将在下一步中获得一些字符串!

 步骤#2 

从第1步获取字符串并将其作为查询执行它

输出是您询问的数据透视表


I have a problem in transposing a large amount of data table in BigQuery (1.5 billion rows) from rows to columns. I could figure out how to do it with small amount of data when hardcoded, but with this large amount. A snapshot of the table looks like this:

+--------------------------+| CustomerID Feature Value |+--------------------------+| 1 A123 3 || 1 F213 7 || 1 F231 8 || 1 B789 9.1 || 2 A123 4 || 2 U123 4 || 2 B789 12 || .. .. .. || .. .. .. || 400000 A123 8 || 400000 U123 7 || 400000 R231 6 |+--------------------------+

So basically there are approximately 400,000 distinct customerID with 3000 features, and not every customerID has the same features, so some customerID may have 2000 features while some have 3000. The end result table I would like to get is each row presents one distinct customerID, and with 3000 columns that presents all the features. Like this:

CustomerID Feature1 Feature2 ... Feature3000

So some of the cells may have missing values.

Anyone has idea how to do this in BigQuery or SQL?

Thanks in advance.

解决方案
STEP #1

In below query replace yourTable with real name of your table and execute/run it

SELECT 'SELECT CustomerID, ' + 
   GROUP_CONCAT_UNQUOTED(
      'MAX(IF(Feature = "' + STRING(Feature) + '", Value, NULL))'
   ) 
   + ' FROM yourTable GROUP BY CustomerID'
FROM (SELECT Feature FROM yourTable GROUP BY Feature) 

As a result you will get some string to be used in next step!

STEP #2

Take string you got from Step 1 and just execute it as a query
The output is a Pivot you asked in question

这篇关于如何将行转换为BigQuery / SQL中包含大量数据的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 23:01