如何在没有手动硬编码的情况下在

如何在没有手动硬编码的情况下在

本文介绍了如何在没有手动硬编码的情况下在 bigquery 标准 SQL 中透视数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张如下所示的表格:

|用户 ID |product_purchased |-------------------------------|111 |一个 ||111 |乙 ||第222话乙 ||第222话乙 ||第333话C ||第444话一个 |

我想对表格进行透视,将用户 ID 作为行,并将用户购买的每个产品的计数作为列.所以对于上表,这看起来像:

|用户 ID |产品A |产品B |产品C |-----------------------------------------------|111 |1 |1 |0 ||第222话0 |2 |0 ||第333话0 |0 |1 ||第444话1 |0 |0 |

我知道这可以使用 countif 语句手动完成:

#standardsql选择用户 ID,countif(product_purchased = 'A') 为 'A',countif(product_purchased = 'B') 为 'B',等等,按 user_id 分组

然而,实际上该表有太多可能的产品,无法手动写出所有选项.有没有办法以更自动化和更优雅的方式进行这种旋转?

解决方案

以下是 BigQuery 标准 SQL

您可以分两步执行此操作 - 首先通过在下面运行来准备动态透视查询

#standardSQLSELECT CONCAT('SELECT user_id, ',STRING_AGG(CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)),' FROM `project.dataset.your_table` GROUP BY user_id')从 (选择产品_购买从`project.dataset.your_table`GROUP BY product_purchased)

作为结果,您将获得表示您需要运行以获得所需结果的查询的字符串

例如,如果要应用于您问题中的虚拟数据

#standardSQLWITH `project.dataset.your_table` AS (SELECT 111 user_id, 'A' product_purchased UNION ALL选择 111, 'B' 联合所有选择 222, 'B' 联合所有选择 222, 'B' 联合所有选择 333, 'C' 联合所有选择 444, 'A')SELECT CONCAT('SELECT user_id, ',STRING_AGG(CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)),' FROM `project.dataset.your_table` GROUP BY user_id')从 (选择产品_购买从`project.dataset.your_table`GROUP BY product_purchased)

您将得到以下查询(格式化以便在此处更好地查看)

SELECT用户身份,COUNTIF(product_purchased = "A") AS product_A,COUNTIF(product_purchased = "B") AS product_B,COUNTIF(product_purchased = "C") AS product_C从`project.dataset.your_table`GROUP BY user_id

现在,您只需运行它即可获得所需的结果,而无需手动编码

同样,如果要针对您问题中的虚拟数据运行它

#standardSQLWITH `project.dataset.your_table` AS (SELECT 111 user_id, 'A' product_purchased UNION ALL选择 111, 'B' 联合所有选择 222, 'B' 联合所有选择 222, 'B' 联合所有选择 333, 'C' 联合所有选择 444, 'A')选择用户身份,COUNTIF(product_purchased = "A") AS product_A,COUNTIF(product_purchased = "B") AS product_B,COUNTIF(product_purchased = "C") AS product_C从`project.dataset.your_table`GROUP BY user_id-- 按用户 ID 排序

你得到了预期的结果

Row user_id product_A product_B product_C1 111 1 1 02 222 0 2 03 333 0 0 14 444 1 0 0

有没有办法以更自动化和更优雅的方式进行这种旋转?

您可以使用任何clienta> 您选择的

I have a table like the following:

| user_id | product_purchased |
-------------------------------
|    111  |        A           |
|    111  |        B           |
|    222  |        B           |
|    222  |        B           |
|    333  |        C           |
|    444  |        A           |

I want to pivot the table to have user ids as rows and counts of each product purchased as by the user as columns.So for the above table, this would look like:

| user_id | product A | product B | product C |
-----------------------------------------------
|    111  |     1      |      1    |     0    |
|    222  |     0      |      2    |     0    |
|    333  |     0      |      0    |     1    |
|    444  |     1      |      0    |     0    |

I know this can be done manually using countif statements:

#standardsql
select user_id,
       countif(product_purchased = 'A') as 'A',
       countif(product_purchased = 'B') as 'B',
       etc,
group by user_id

However, in reality the table has too many possible products to make it feasible to write all of the options out manually. Is there a way to do this pivoting in a more automated and elegant way?

Below is for BigQuery Standard SQL

You can do this in two steps - first prepare dynamically pivot query by running below

#standardSQL
SELECT CONCAT('SELECT user_id, ',
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ),
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased
  FROM `project.dataset.your_table`
  GROUP BY product_purchased
)

as a result you will get string representing the query that you need to run to get desired result

As an example, if to apply to dummy data from your question

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A'
)
SELECT CONCAT('SELECT user_id, ',
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ),
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased
  FROM `project.dataset.your_table`
  GROUP BY product_purchased
)

you will get below query (formatted for better view here)

SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id

Now, you can just run this to get desired result without manual coding

Again, if to run it against dummy data from your question

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A'
)
SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
-- ORDER BY user_id

you get expected result

Row user_id product_A   product_B   product_C
1   111     1           1           0
2   222     0           2           0
3   333     0           0           1
4   444     1           0           0

You can easily automate above using any client of your choice

这篇关于如何在没有手动硬编码的情况下在 bigquery 标准 SQL 中透视数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-03 21:34