从Python开始,我遇到了一个必须常见但无法找到直接解决方案的问题。我有一些虚构的问卷调查数据,希望获得有意义的描述。具体来说,对于每个问题,我想知道一次特定的回答(“是” /“也许” /“否”)给出了多少次。
输入:
Question1 Question2 Question3
Answer1 Maybe Yes Yes
Answer2 No Maybe Yes
Answer3 Maybe Maybe No
Answer4 No Yes Maybe
现在,我想对一个问题的特定答案进行一次很好的概述。首选输出将是这样的:
(首选)输出:
Yes Maybe No
Question1 0 2 2
Question2 2 2 0
Question3 2 1 1
我自己的想法是解决方案必须在“ groupby”命令中。到目前为止,我还没有成功获得任何有意义的输出:
df.groupby(['Question1']).sum()
Question2 Question3
Question1
Maybe YesMaybe YesNo
No MaybeYes YesMaybe
我用以下方法生成了虚拟数据:
# Generate data
data = np.array([['','Question1','Question2','Question3'],['Answer1',"Maybe","Yes","Yes"],['Answer2',"No","Maybe","Yes"],['Answer3',"Maybe","Maybe","No"],['Answer4',"No","Yes","Maybe"]])
# convert to pandas dataframe
df = pd.DataFrame(data=data[1:,1:],index=data[1:,0],columns=data[0,1:])
我知道这肯定是一个容易的挑战,但是任何帮助将不胜感激。
最佳答案
只是
df.apply(pd.value_counts).fillna(0)
Question1 Question2 Question3
Maybe 2.0 2.0 1.0
No 2.0 0.0 1.0
Yes 0.0 2.0 2.0
如果需要,可以将其转置
df.apply(pd.value_counts).fillna(0).T
Maybe No Yes
Question1 2.0 2.0 0.0
Question2 2.0 0.0 2.0
Question3 1.0 1.0 2.0