我有一个针对所有问题的7分制的调查数据集,我想在所有列中获取通用值的value_counts(并按两列对数据框进行分组)。让我向您展示一个样本数据集以及到目前为止我到达的位置。

| col1          | col2          | col3          | Building      | Levels_Name            |
|---------------|---------------|---------------|---------------|------------------------|
| Not Satisfied | Not Satisfied | Not Satisfied | San Francisco | Individual Contributor |
| Satisfied     | Satisfied     | NA            | Basingstoke   | Individual Contributor |
| Not Satisfied | Satisfied     | Not Satisfied | San Francisco | Middle Management      |
| Not Satisfied | Satisfied     | Not Satisfied | Miami         | Senior Leadership      |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City   | Senior Leadership      |
| NA            | NA            | NA            | Foster City   | Other                  |
| Not Satisfied | Not Satisfied | NA            | Foster City   | Senior Leadership      |
| Not Satisfied | Satisfied     | Not Satisfied | Austin        | Middle Management      |
| Satisfied     | Satisfied     | Satisfied     | San Francisco | Senior Leadership      |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City   | Individual Contributor |
| Satisfied     | Satisfied     | NA            | Miami         | Middle Management      |


现在,我想按“ Building”和“ Levels_Name”对数据集进行分组,并为“ Satisfied”,“ Not Satisfied”,“ NA”添加新分组,并获取每列的值计数。

因此结果应如下所示:

| Building      | Levels_Name            | Sentiment     | col1 | col2 | col3 |
|---------------|------------------------|---------------|------|------|------|
| Foster City   | Individual Contributor | Not Satisfied | 1    | 1    | 1    |
| Foster City   | Individual Contributor | NA            | 0    | 0    | 0    |
| Foster City   | Individual Contributor | Satisfied     | 0    | 0    | 0    |
| Foster City   | Senior Leadership      | Not Satisfied | 2    | 2    | 0    |
| Foster City   | Senior Leadership      | NA            | 0    | 0    | 1    |
| Foster City   | Senior Leadership      | Satisfied     | 0    | 0    | 0    |
| San Francisco | Individual Contributor | Not Satisfied | 1    | 1    | 1    |
| San Francisco | Individual Contributor | NA            | 0    | 0    | 0    |
| San Francisco | Individual Contributor | Satisfied     | 0    | 0    | 0    |


谢谢!

最佳答案

首先,您想融合数据框,然后进行分组

d1 = pd.melt(
    df, ['Building', 'Levels_Name'], value_name='Sentiment'
).replace(np.nan, 'NaN')

d1.groupby(
    d1.columns.tolist()
).size().unstack('variable', fill_value=0).reset_index()

variable       Building             Levels_Name      Sentiment  col1  col2  col3
0                Austin       Middle Management  Not Satisfied     1     0     1
1                Austin       Middle Management      Satisfied     0     1     0
2           Basingstoke  Individual Contributor            NaN     0     0     1
3           Basingstoke  Individual Contributor      Satisfied     1     1     0
4           Foster City  Individual Contributor  Not Satisfied     1     1     1
5           Foster City                   Other            NaN     1     1     1
6           Foster City       Senior Leadership            NaN     0     0     1
7           Foster City       Senior Leadership  Not Satisfied     2     2     1
8                 Miami       Middle Management            NaN     0     0     1
9                 Miami       Middle Management      Satisfied     1     1     0
10                Miami       Senior Leadership  Not Satisfied     1     0     1
11                Miami       Senior Leadership      Satisfied     0     1     0
12        San Francisco  Individual Contributor  Not Satisfied     1     1     1
13        San Francisco       Middle Management  Not Satisfied     1     0     1
14        San Francisco       Middle Management      Satisfied     0     1     0
15        San Francisco       Senior Leadership      Satisfied     1     1     1

关于python - Python- Pandas -分组数据框中所有列的value_counts,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43829096/

10-12 21:46
查看更多