问题描述
让我们说我有一个像这样的数据集.
Lets say I have a dataset like this.
ID Item
1 A
2 B
3 A
1 C
2 B
2 B
1 A
3 B
实际数据集具有5万行和8.5K个不同的Item值.
The actual dataset has 50K rows and 8.5K distinct Item values.
现在,我想将每个ID的行转置为列,以便我们为每个用户计算每个项目的值.像下面一样
Now I want to transpose the rows to columns for each Id such that we are counting the values for each item for each user. Like below
ID A B C
1 2 0 1
2 0 3 0
3 1 1 0
现在,如果我想设置编号.项值,然后我可以对它们进行硬编码,以防发生汇总语句.
Now if I have like set no. of Item values then I can hardcode them in case aggregate statement.
类似
select ID, count(case when Item ='A' then 1 else Null end) A,
count(case when Item='B' then 1 else Null end) B... and so on.
但是在这种情况下,我有8.5K唯一物品.像上面在sql中一样,这将是太多的工作.
But in this case I have 8.5K unique Item. This would be too much task to do as above in sql.
如何实现这是Sql和ii)Python.请记住,我有5万行和8.5k个唯一项,因此python循环可能会变慢.
How can I achieve this is Sql and ii) Python. Remember I have 50K rows and 8.5k unique items, so python looping might get a bit slow.
但是我的首要目标是在sql中执行此操作,因为这是50K的示例数据.当数据变得更大时,sql仍然可以很好地保存它,但是python会变慢.在python中,我也可以尝试一些代码.但是当数据超过50K obs时,python也会变慢.
But my first aim is to do this in sql, since this is a sample data of 50K. when data grows bigger sql would still hold it well, but python there would become slow I feel.In python I can try some code too.But again when data is more than 50K obs then python would get it slow too
请告知
不使用Postgresql和使用HANA,因此特定于Postgres的功能可能不起作用.请提出通用的sql方式.
Not using Postgresql and using HANA, so function specific to Postgres might not work. Please suggest generic sql way.
推荐答案
如果您确实想在python中执行此操作,则应查看pandas
库
If you did want to do this in python, you should look at the pandas
library
df = pd.DataFrame({
'ID': [1, 2, 3, 1, 2, 2, 1, 3],
'Item': ['A', 'B', 'A', 'C', 'B', 'B' , 'A', 'B']
})
pd.crosstab(df['ID'], df['Item'])
输出:
Item A B C
ID
1 2 0 1
2 0 3 0
3 1 1 0
这篇关于将具有太多值的行转换为列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!