问题描述
将大熊猫导入pd
df1 = pd.DataFrame({'index':range(8),
'variable1':[A,A,B,B,A B,B,A],
'variable2':[a,b,a,b,a,b b],
'variable3':[x,x,x,y,y,y,x,y],
'result':[on,off,off,on,on,off,off,on]})
df1。 pivot_table(values ='result',rows ='index',cols = ['variable1','variable2','variable3'])
但是我得到: DataError:没有数字类型来聚合
。
当我将结果值更改为数字时,按照预期的方式工作:
df2 = pd.DataFrame({'index':range(8) ,
'variable1':[A,A,B,B,A,B,B,A],
'variable2' :[a,b,a,b,a,b,a,b],
'variable3':[x ,x,y,y,y,x,y],
'result':[1,0,0,1,1,0,0, 1]})
df2.pivot_table(values ='result',rows ='index',cols = ['variable1','variable2','variable3'])
我得到了我所需要的:
variable1 AB
variable2 abab
variable3 xyxyxy
index
0 1 NaN NaN NaN NaN NaN
1 NaN NaN 0 NaN NaN NaN
2 NaN NaN NaN NaN 0 NaN
3 NaN NaN NaN NaN NaN 1
4 NaN 1 NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN 0
6 NaN NaN NaN NaN 0 NaN
7 NaN NaN NaN 1 NaN NaN
我知道我可以将字符串映射到数值,然后将操作,但也许有一个更优雅的解决方案?
我的原始回复是基于熊猫0.14.1,从那时起在pivot_table函数中有很多改变(rows - >)索引,cols - >列...)
此外,我发布的原始lambda技巧似乎不再适用于Pandas 0.18。您必须提供减少功能(即使是最小,最大或均值)。但是即使这样似乎是不正确的 - 因为我们没有减少数据集,只是转换它....所以我看起来更加困难...
import pandas as pd
df1 = pd.DataFrame({'index':range(8),
'variable1':[A,A B,B,A,B,B,A],
'variable2':[a,b,a a,b,a,b],
'variable3':[x,x,x,y,y,y x,y],
'result':[on,off,off,on,on,off,off,on })
#这些是最后在多索引列中的列。
unfack_cols = ['variable1','variable2','variable3']
使用索引+您要堆叠的列设置数据索引,然后使用级别arg调用拆分。
df1 .set_index(['index'] + unsack_cols).unstack(level = unfack_cols)
结果数据框是下面。
I'm trying to do a pivot of a table containing strings as results.
import pandas as pd
df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})
df1.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])
But I get: DataError: No numeric types to aggregate
.
This works as intended when I change result values to numbers:
df2 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': [1,0,0,1,1,0,0,1]})
df2.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])
And I get what I need:
variable1 A B
variable2 a b a b
variable3 x y x y x y
index
0 1 NaN NaN NaN NaN NaN
1 NaN NaN 0 NaN NaN NaN
2 NaN NaN NaN NaN 0 NaN
3 NaN NaN NaN NaN NaN 1
4 NaN 1 NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN 0
6 NaN NaN NaN NaN 0 NaN
7 NaN NaN NaN 1 NaN NaN
I know I can map the strings to numerical values and then reverse the operation, but maybe there is a more elegant solution?
My original reply was based on Pandas 0.14.1, and since then, many things changed in the pivot_table function (rows --> index, cols --> columns... )
Additionally, it appears that the original lambda trick I posted no longer works on Pandas 0.18. You have to provide a reducing function (even if it is min, max or mean). But even that seemed improper - because we are not reducing the data set, just transforming it.... So I looked harder at unstack...
import pandas as pd
df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})
# these are the columns to end up in the multi-index columns.
unstack_cols = ['variable1', 'variable2', 'variable3']
First, set an index on the data using the index + the columns you want to stack, then call unstack using the level arg.
df1.set_index(['index'] + unstack_cols).unstack(level=unstack_cols)
Resulting dataframe is below.
这篇关于 pandas - 具有非数字值的pivot_table? (DataError:无数字类型聚合)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!