如何根据另一个列值过滤数据框中的行?
我有一个数据框,
ip_df:
class name marks min_marks min_subjects
0 I tom [89,85,80,74] 80 2
1 II sam [65,72,43,40] 85 1
基于“min_subject”和“min_marks”的列值,应过滤该行。
最终结果应该是
op_df:
class name marks min_marks min_subjects flag
0 I tom [89,85,80,74] 80 2 1
1 II sam [65,72,43,40] 85 1 0
谁能帮助我在数据框中实现同样的目标?
最佳答案
将列表理解与zip
一起使用3列,比较生成器和sum
中的每个值以进行计数,最后以最小标记进行比较并转换为整数:
df['flag'] = [1 if sum(x > c for x in a) >= b else 0
for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]
通过
int
将boolean转换为0,1
的替代方法:df['flag'] = [int(sum(x > c for x in a) >= b)
for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]
或用
numpy
解决方案:df['flag'] = [int(np.sum(np.array(a) > c) >= b)
for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]
print (df)
class name marks min_marks min_subjects flag
0 I tom [89, 85, 80, 74] 80 2 1
1 II sam [65, 72, 43, 40] 85 1 0
关于python - Pandas : filter the rows based on a column containing lists,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58817539/