python - 仅当另一列包含特定值时，如何选择所有行用于特定ID

我有一个CSV文件，其中包含数百行具有可重复ID的行。是否有一种方法可以轻松地为每个ID选择行，但前提是column customerCount列具有对应ID的所有值= 0？

我的CSV文件结构：

  report_date     id    customerCount    orderNr
  2020-02-20    123        12              10
  2020-02-19    123        18              11
  2020-02-18    123        0               12
  2020-02-20    321        0               0
  2020-02-19    321        0               0
  2020-02-18    321        0               0
  2020-02-20    456        17              13
  2020-02-19    456        0               0
  2020-02-18    456        15              14
  2020-02-20    654        0               0
  2020-02-19    654        0               0
  2020-02-18    654        0               0
  and so on...

所需的输出CSV：

id    customerCount
321         0
654         0

到目前为止，我的代码（抛出TypeError: 'method' object is not subscriptable）：

import pandas as pd

df = pd.read_csv('path/to/my/file.csv')
df1 = df.loc[(df.groupby['id'](['customerCount'] == 0)]
df1.to_csv('/path/to/my.output.csv')

提前致谢！

最佳答案

第一个想法是将DataFrame.all与mask一起使用，然后单独过滤mask并转换为DataFrame：

s = (df['customerCount'] == 0).groupby(df['id']).all()

df = s[s].reset_index()
df['customerCount'] = 0
print (df)
    id  customerCount
0  321              0
1  654              0

或将Series.isin与~的所有id一起使用带有反转掩码的所有0，而没有DataFrame并通过构造函数创建id：

ids = df.loc[~df['id'].isin(df.loc[df['customerCount'] != 0, 'id']), 'id'].unique()

df = pd.DataFrame({'id':ids, 'customerCount':0})
print (df)
    id  customerCount
0  321              0
1  654              0

编辑：通过删除~通过更改掩码创建新的值：

mask = df['id'].isin(df.loc[df['customerCount'] != 0, 'id'])
ids1 = df.loc[~mask, 'id'].unique()
ids2 = df.loc[mask, 'id'].unique()

df1 = pd.DataFrame({'id':ids1, 'customerCount':0})
df2 = pd.DataFrame({'id':ids2, 'customerCount':'>0'})
print (df1)
    id  customerCount
0  321              0
1  654              0

print (df2)
    id customerCount
0  123            >0
1  456            >0