从Python开始,我遇到了一个必须常见但无法找到直接解决方案的问题。我有一些虚构的问卷调查数据,希望获得有意义的描述。具体来说,对于每个问题,我想知道一次特定的回答(“是” /“也许” /“否”)给出了多少次。

输入:

         Question1   Question2   Question3
Answer1  Maybe       Yes         Yes
Answer2  No          Maybe       Yes
Answer3  Maybe       Maybe       No
Answer4  No          Yes         Maybe


现在,我想对一个问题的特定答案进行一次很好的概述。首选输出将是这样的:

(首选)输出:

           Yes     Maybe    No
Question1  0       2        2
Question2  2       2        0
Question3  2       1        1


我自己的想法是解决方案必须在“ groupby”命令中。到目前为止,我还没有成功获得任何有意义的输出:

df.groupby(['Question1']).sum()
      Question2 Question3
Question1
Maybe      YesMaybe     YesNo
No         MaybeYes  YesMaybe


我用以下方法生成了虚拟数据:

# Generate data
data = np.array([['','Question1','Question2','Question3'],['Answer1',"Maybe","Yes","Yes"],['Answer2',"No","Maybe","Yes"],['Answer3',"Maybe","Maybe","No"],['Answer4',"No","Yes","Maybe"]])


# convert to pandas dataframe
df = pd.DataFrame(data=data[1:,1:],index=data[1:,0],columns=data[0,1:])


我知道这肯定是一个容易的挑战,但是任何帮助将不胜感激。

最佳答案

只是

df.apply(pd.value_counts).fillna(0)


            Question1   Question2   Question3
Maybe       2.0         2.0         1.0
No          2.0         0.0         1.0
Yes         0.0         2.0         2.0


如果需要,可以将其转置df.apply(pd.value_counts).fillna(0).T

            Maybe   No    Yes
Question1   2.0     2.0   0.0
Question2   2.0     0.0   2.0
Question3   1.0     1.0   2.0

10-04 20:59
查看更多