python - 过滤 Pandas 中的前n个值

我有以下数据集：

ID   Group Name   Information
1    A            'Info type1'
1    A            'Info type2'
2    B            'Info type2'
2    B            'Info type3'
2    B            'Info type4'
3    A            'Info type2'
3    A            'Info type5'
3    A            'Info type2'

最终，我要计算一个特定组处理了多少项目，并按特定Info type将它们分组。

第一步，我定义了一个函数，以某种方式过滤特定的info type：

def checkrejcted(strval):
    if strval == 'Info type5':
        return 'Rejected'
    else:
        return 'Not rejected'

在下一步中，我已将此功能应用于information列：

dataset['CheckRejected'] = dataset['Information'].apply(checkrejcted)

最后，在删除information列之后，我删除了重复项。
因此，数据集如下所示：

ID   Group Name   CheckRejected
1    A            'Not rejected'
2    B            'Not rejected'
3    A            'Not rejected'
3    A            'Rejected'

我想知道，是否有一种更聪明的方法来计算特定组名出现的频率并根据Not rejected，Rejected对其进行分组。可能会发生，特定项目可以同时具有information Rejected / Not rejected。很好，因为我假设在计数图中将对这两项都进行计数。

最佳答案

您可以使用地图和fillna进行默认的不匹配操作：

maps = { "'Info type5'": "'Rejected'" }
or
maps = { "'Info type1'": "'Not Rejected'",   "'Info type2'": "'Not Rejected'" ,  "'Info type3'": "'Not Rejected'" ,  "'Info type4'": "'Not Rejected'", "'Info type5'": "'Rejected'"  }

df['Information'].map(maps).fillna('Not Rejected')

0    'Not Rejected'
1    'Not Rejected'
2    'Not Rejected'
3    'Not Rejected'
4    'Not Rejected'
5    'Not Rejected'
6        'Rejected'
7    'Not Rejected'

df ['CheckRejected'] = df ['Information']。map（maps）.fillna（“'Not Rejected'”）

   ID Group Name   Information   CheckRejected
0   1          A  'Info type1'  'Not Rejected'
1   1          A  'Info type2'  'Not Rejected'
2   2          B  'Info type2'  'Not Rejected'
3   2          B  'Info type3'  'Not Rejected'
4   2          B  'Info type4'  'Not Rejected'
5   3          A  'Info type2'  'Not Rejected'
6   3          A  'Info type5'      'Rejected'
7   3          A  'Info type2'  'Not Rejected'

df.drop（columns ='Information'）。drop_duplicates（）

   ID Group Name   CheckRejected
0   1          A  'Not Rejected'
2   2          B  'Not Rejected'
5   3          A  'Not Rejected'
6   3          A      'Rejected'

关于python - 过滤 Pandas 中的前n个值，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/59329777/