我有一个文件,用于比较基础数据集的不同视图的不同信息。目标是列出信息片段并比较总计。
我有以下数据框:
df = pandas.DataFrame({"Measures":
['Country','State','County','City'],
"Green": ['Included','Excluded','Included','Included'], "Orange":
['Excluded', 'Excluded', 'Excluded', 'Included']})
我有以下基础数据集:
Location Green Orange
Country 1 6
State 3 10
County 2 15
City 5 20
我希望最终结果如下所示:
Measures Green Orange
Country Included Excluded
State Excluded Excluded
County Included Excluded
City Included Included
Total 8 20
最佳答案
在计算总和之前,可以使用df
掩盖基础数据框的值。
m = df.eq('Included')
# Assume df2 is your underlying DataFrame.
v = df2[m].sum()
# Assign the total back as a new row in df.
df.loc['Total', :] = v[df2.dtypes != object]
df
Measures Green Orange
0 Country Included Excluded
1 State Excluded Excluded
2 County Included Excluded
3 City Included Included
Total NaN 8 20
如果要获得更相同的输出,另一种方法是分别将“度量”和“位置”设置为索引。
df = df.set_index('Measures')
df2 = df2.set_index('Location')
m = df.eq('Included')
v = df2[m].sum()
df.loc['Total', :] = v
df
Green Orange
Measures
Country Included Excluded
State Excluded Excluded
County Included Excluded
City Included Included
Total 8 20
关于python - python pandas dataframe添加具有过滤条件的总列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/53808834/