本文介绍了根据其他行中的值删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我一直在寻找一种根据要检查的条件从另一行中删除数据行的方法。
I was looking for a way to drop rows from my dataframe based on conditions to be checked with values in another row.
这是我的数据框:
product product_id account_status
prod-A 100 active
prod-A 100 cancelled
prod-A 300 active
prod-A 400 cancelled
如果存在具有account_status ='active'的行产品和和product_id组合,然后保留该行并删除其他行。
If a row with account_status='active' exists for a product & and product_id combination, then retain this row and delete other rows.
所需的输出为:
product product_id account_status
prod-A 100 active
prod-A 300 active
prod-A 400 cancelled
我看到提到的解决方案,但无法将其复制为字符串。
I saw the solution mentioned here but couldn't replicate it for strings.
请提出建议。
推荐答案
对于更通用的解决方案,如果每个组至少存在一个活动 account_status
值$ c>的值:
For more general solution removing only another account_status
values per groups if exist at least one active
value there:
print (df)
product product_id account_status
0 prod-A 100 active
1 prod-A 100 cancelled <- necessary remove
2 prod-A 300 active
3 prod-A 400 cancelled
4 prod-A 500 active
5 prod-A 500 active
6 prod-A 600 cancelled
7 prod-A 600 cancelled
s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
product product_id account_status
0 prod-A 100 active
2 prod-A 300 active
3 prod-A 400 cancelled
4 prod-A 500 active
5 prod-A 500 active
6 prod-A 600 cancelled
7 prod-A 600 cancelled
也可以很好地与多个类别配合使用:
Also working nice with multiple categories:
print (df)
product product_id account_status
0 prod-A 100 active
1 prod-A 100 cancelled <- necessary remove
2 prod-A 100 pending <- necessary remove
3 prod-A 300 active
4 prod-A 300 pending <- necessary remove
5 prod-A 400 cancelled
6 prod-A 500 active
7 prod-A 500 active
8 prod-A 600 pending
9 prod-A 600 cancelled
s = df['account_status'].eq('active')
g = df.assign(A=s).groupby(['product','product_id'])['A']
mask = ~g.transform('any') | g.transform('all') | s
df = df[mask]
print (df)
product product_id account_status
0 prod-A 100 active
3 prod-A 300 active
5 prod-A 400 cancelled
6 prod-A 500 active
7 prod-A 500 active
8 prod-A 600 pending
9 prod-A 600 cancelled
这篇关于根据其他行中的值删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!