问题描述
我有一个如下表:
| user_id | product_purchased |
-------------------------------
| 111 | A |
| 111 | B |
| 222 | B |
| 222 | B |
| 333 | C |
| 444 | A |
我想透视表以将用户ID作为行,将用户购买的每种产品的计数作为列.因此,对于上表,它看起来像:
I want to pivot the table to have user ids as rows and counts of each product purchased as by the user as columns.So for the above table, this would look like:
| user_id | product A | product B | product C |
-----------------------------------------------
| 111 | 1 | 1 | 0 |
| 222 | 0 | 2 | 0 |
| 333 | 0 | 0 | 1 |
| 444 | 1 | 0 | 0 |
我知道这可以使用countif语句手动完成:
I know this can be done manually using countif statements:
#standardsql
select user_id,
countif(product_purchased = 'A') as 'A',
countif(product_purchased = 'B') as 'B',
etc,
group by user_id
但是,实际上该表有太多可能的产品,因此无法手动写出所有选项.有没有办法以一种更加自动化和优雅的方式来实现这一点?
However, in reality the table has too many possible products to make it feasible to write all of the options out manually. Is there a way to do this pivoting in a more automated and elegant way?
推荐答案
以下是适用于BigQuery标准SQL的
Below is for BigQuery Standard SQL
您可以分两个步骤进行操作-首先通过在下面运行
You can do this in two steps - first prepare dynamically pivot query by running below
#standardSQL
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
结果是,您将获得表示需要运行以获取所需结果的查询的字符串
as a result you will get string representing the query that you need to run to get desired result
例如,如果要应用于问题中的伪数据
As an example, if to apply to dummy data from your question
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
您将获得以下查询(在此处设置格式以便于查看)
you will get below query (formatted for better view here)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
现在,您只需运行此命令即可获得所需结果,而无需手动编码
Now, you can just run this to get desired result without manual coding
同样,如果要针对您问题中的伪数据运行它
Again, if to run it against dummy data from your question
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
-- ORDER BY user_id
您获得了预期的结果
Row user_id product_A product_B product_C
1 111 1 1 0
2 222 0 2 0
3 333 0 0 1
4 444 1 0 0
您可以使用选择的任何 client
轻松地自动执行以上操作
You can easily automate above using any client
of your choice
这篇关于如何在没有手动硬编码的情况下在bigquery标准SQL中透视数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!