我有一个针对所有问题的7分制的调查数据集,我想在所有列中获取通用值的value_counts(并按两列对数据框进行分组)。让我向您展示一个样本数据集以及到目前为止我到达的位置。
| col1 | col2 | col3 | Building | Levels_Name |
|---------------|---------------|---------------|---------------|------------------------|
| Not Satisfied | Not Satisfied | Not Satisfied | San Francisco | Individual Contributor |
| Satisfied | Satisfied | NA | Basingstoke | Individual Contributor |
| Not Satisfied | Satisfied | Not Satisfied | San Francisco | Middle Management |
| Not Satisfied | Satisfied | Not Satisfied | Miami | Senior Leadership |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Senior Leadership |
| NA | NA | NA | Foster City | Other |
| Not Satisfied | Not Satisfied | NA | Foster City | Senior Leadership |
| Not Satisfied | Satisfied | Not Satisfied | Austin | Middle Management |
| Satisfied | Satisfied | Satisfied | San Francisco | Senior Leadership |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Individual Contributor |
| Satisfied | Satisfied | NA | Miami | Middle Management |
现在,我想按“ Building”和“ Levels_Name”对数据集进行分组,并为“ Satisfied”,“ Not Satisfied”,“ NA”添加新分组,并获取每列的值计数。
因此结果应如下所示:
| Building | Levels_Name | Sentiment | col1 | col2 | col3 |
|---------------|------------------------|---------------|------|------|------|
| Foster City | Individual Contributor | Not Satisfied | 1 | 1 | 1 |
| Foster City | Individual Contributor | NA | 0 | 0 | 0 |
| Foster City | Individual Contributor | Satisfied | 0 | 0 | 0 |
| Foster City | Senior Leadership | Not Satisfied | 2 | 2 | 0 |
| Foster City | Senior Leadership | NA | 0 | 0 | 1 |
| Foster City | Senior Leadership | Satisfied | 0 | 0 | 0 |
| San Francisco | Individual Contributor | Not Satisfied | 1 | 1 | 1 |
| San Francisco | Individual Contributor | NA | 0 | 0 | 0 |
| San Francisco | Individual Contributor | Satisfied | 0 | 0 | 0 |
谢谢!
最佳答案
首先,您想融合数据框,然后进行分组
d1 = pd.melt(
df, ['Building', 'Levels_Name'], value_name='Sentiment'
).replace(np.nan, 'NaN')
d1.groupby(
d1.columns.tolist()
).size().unstack('variable', fill_value=0).reset_index()
variable Building Levels_Name Sentiment col1 col2 col3
0 Austin Middle Management Not Satisfied 1 0 1
1 Austin Middle Management Satisfied 0 1 0
2 Basingstoke Individual Contributor NaN 0 0 1
3 Basingstoke Individual Contributor Satisfied 1 1 0
4 Foster City Individual Contributor Not Satisfied 1 1 1
5 Foster City Other NaN 1 1 1
6 Foster City Senior Leadership NaN 0 0 1
7 Foster City Senior Leadership Not Satisfied 2 2 1
8 Miami Middle Management NaN 0 0 1
9 Miami Middle Management Satisfied 1 1 0
10 Miami Senior Leadership Not Satisfied 1 0 1
11 Miami Senior Leadership Satisfied 0 1 0
12 San Francisco Individual Contributor Not Satisfied 1 1 1
13 San Francisco Middle Management Not Satisfied 1 0 1
14 San Francisco Middle Management Satisfied 0 1 0
15 San Francisco Senior Leadership Satisfied 1 1 1
关于python - Python- Pandas -分组数据框中所有列的value_counts,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43829096/