如何在没有手动硬编码的情况下在bigquery标准SQL中透视数据?

本文介绍了如何在没有手动硬编码的情况下在bigquery标准SQL中透视数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个如下表:

| user_id | product_purchased |
-------------------------------
|    111  |        A           |
|    111  |        B           |
|    222  |        B           |
|    222  |        B           |
|    333  |        C           |
|    444  |        A           |

我想透视表以将用户ID作为行，将用户购买的每种产品的计数作为列.因此，对于上表，它看起来像:

I want to pivot the table to have user ids as rows and counts of each product purchased as by the user as columns.So for the above table, this would look like:

| user_id | product A | product B | product C |
-----------------------------------------------
|    111  |     1      |      1    |     0    |
|    222  |     0      |      2    |     0    |
|    333  |     0      |      0    |     1    |
|    444  |     1      |      0    |     0    |

我知道这可以使用countif语句手动完成:

I know this can be done manually using countif statements:

#standardsql
select user_id,
       countif(product_purchased = 'A') as 'A',
       countif(product_purchased = 'B') as 'B',
       etc,
group by user_id

但是，实际上该表有太多可能的产品，因此无法手动写出所有选项.有没有办法以一种更加自动化和优雅的方式来实现这一点?

However, in reality the table has too many possible products to make it feasible to write all of the options out manually. Is there a way to do this pivoting in a more automated and elegant way?

推荐答案

以下是适用于BigQuery标准SQL的

Below is for BigQuery Standard SQL

您可以分两个步骤进行操作-首先通过在下面运行

You can do this in two steps - first prepare dynamically pivot query by running below

#standardSQL
SELECT CONCAT('SELECT user_id, ',
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ),
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased
  FROM `project.dataset.your_table`
  GROUP BY product_purchased
)

结果是，您将获得表示需要运行以获取所需结果的查询的字符串

as a result you will get string representing the query that you need to run to get desired result

例如，如果要应用于问题中的伪数据

As an example, if to apply to dummy data from your question

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A'
)
SELECT CONCAT('SELECT user_id, ',
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ),
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased
  FROM `project.dataset.your_table`
  GROUP BY product_purchased
)

您将获得以下查询(在此处设置格式以便于查看)

you will get below query (formatted for better view here)

SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id

现在，您只需运行此命令即可获得所需结果，而无需手动编码

Now, you can just run this to get desired result without manual coding

同样，如果要针对您问题中的伪数据运行它

Again, if to run it against dummy data from your question

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A'
)
SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
-- ORDER BY user_id

您获得了预期的结果

Row user_id product_A   product_B   product_C
1   111     1           1           0
2   222     0           2           0
3   333     0           0           1
4   444     1           0           0

您可以使用选择的任何 client 轻松地自动执行以上操作

You can easily automate above using any client of your choice

这篇关于如何在没有手动硬编码的情况下在bigquery标准SQL中透视数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！