问题描述
我有一张如下所示的表格:
|用户 ID |product_purchased |-------------------------------|111 |一个 ||111 |乙 ||第222话乙 ||第222话乙 ||第333话C ||第444话一个 |
我想对表格进行透视,将用户 ID 作为行,并将用户购买的每个产品的计数作为列.所以对于上表,这看起来像:
|用户 ID |产品A |产品B |产品C |-----------------------------------------------|111 |1 |1 |0 ||第222话0 |2 |0 ||第333话0 |0 |1 ||第444话1 |0 |0 |
我知道这可以使用 countif 语句手动完成:
#standardsql选择用户 ID,countif(product_purchased = 'A') 为 'A',countif(product_purchased = 'B') 为 'B',等等,按 user_id 分组
然而,实际上该表有太多可能的产品,无法手动写出所有选项.有没有办法以更自动化和更优雅的方式进行这种旋转?
以下是 BigQuery 标准 SQL
您可以分两步执行此操作 - 首先通过在下面运行来准备动态透视查询
#standardSQLSELECT CONCAT('SELECT user_id, ',STRING_AGG(CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)),' FROM `project.dataset.your_table` GROUP BY user_id')从 (选择产品_购买从`project.dataset.your_table`GROUP BY product_purchased)
作为结果,您将获得表示您需要运行以获得所需结果的查询的字符串
例如,如果要应用于您问题中的虚拟数据
#standardSQLWITH `project.dataset.your_table` AS (SELECT 111 user_id, 'A' product_purchased UNION ALL选择 111, 'B' 联合所有选择 222, 'B' 联合所有选择 222, 'B' 联合所有选择 333, 'C' 联合所有选择 444, 'A')SELECT CONCAT('SELECT user_id, ',STRING_AGG(CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)),' FROM `project.dataset.your_table` GROUP BY user_id')从 (选择产品_购买从`project.dataset.your_table`GROUP BY product_purchased)
您将得到以下查询(格式化以便在此处更好地查看)
SELECT用户身份,COUNTIF(product_purchased = "A") AS product_A,COUNTIF(product_purchased = "B") AS product_B,COUNTIF(product_purchased = "C") AS product_C从`project.dataset.your_table`GROUP BY user_id
现在,您只需运行它即可获得所需的结果,而无需手动编码
同样,如果要针对您问题中的虚拟数据运行它
#standardSQLWITH `project.dataset.your_table` AS (SELECT 111 user_id, 'A' product_purchased UNION ALL选择 111, 'B' 联合所有选择 222, 'B' 联合所有选择 222, 'B' 联合所有选择 333, 'C' 联合所有选择 444, 'A')选择用户身份,COUNTIF(product_purchased = "A") AS product_A,COUNTIF(product_purchased = "B") AS product_B,COUNTIF(product_purchased = "C") AS product_C从`project.dataset.your_table`GROUP BY user_id-- 按用户 ID 排序
你得到了预期的结果
Row user_id product_A product_B product_C1 111 1 1 02 222 0 2 03 333 0 0 14 444 1 0 0
有没有办法以更自动化和更优雅的方式进行这种旋转?
您可以使用任何client
a> 您选择的
I have a table like the following:
| user_id | product_purchased |
-------------------------------
| 111 | A |
| 111 | B |
| 222 | B |
| 222 | B |
| 333 | C |
| 444 | A |
I want to pivot the table to have user ids as rows and counts of each product purchased as by the user as columns.So for the above table, this would look like:
| user_id | product A | product B | product C |
-----------------------------------------------
| 111 | 1 | 1 | 0 |
| 222 | 0 | 2 | 0 |
| 333 | 0 | 0 | 1 |
| 444 | 1 | 0 | 0 |
I know this can be done manually using countif statements:
#standardsql
select user_id,
countif(product_purchased = 'A') as 'A',
countif(product_purchased = 'B') as 'B',
etc,
group by user_id
However, in reality the table has too many possible products to make it feasible to write all of the options out manually. Is there a way to do this pivoting in a more automated and elegant way?
Below is for BigQuery Standard SQL
You can do this in two steps - first prepare dynamically pivot query by running below
#standardSQL
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
as a result you will get string representing the query that you need to run to get desired result
As an example, if to apply to dummy data from your question
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
you will get below query (formatted for better view here)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
Now, you can just run this to get desired result without manual coding
Again, if to run it against dummy data from your question
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
-- ORDER BY user_id
you get expected result
Row user_id product_A product_B product_C
1 111 1 1 0
2 222 0 2 0
3 333 0 0 1
4 444 1 0 0
You can easily automate above using any client
of your choice
这篇关于如何在没有手动硬编码的情况下在 bigquery 标准 SQL 中透视数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!