问题描述
这是我的数据框:
Fruits Person Eat
Banana Peter Yes
Banana Ashley Yes
Strawberry Peter No
Strawberry Ashley Yes
Cherry Peter Yes
Orange Peter No
Orange Ashley No
Grape Ashley Yes
Pear Ashley Yes
Pear Peter Yes
我的数据框中有重复的水果.我需要根据以下逻辑删除重复项.如果有重复的水果,而Peter和Ashley都吃了,则保留Peter的行,并删除Ashley的行.如果有重复的水果而Peter不吃而Ashley吃了,那么Peter的行将被删除,Ashley的行将保留.如果有重复的水果并且Peter不吃而Ashley不吃,则两行都将被删除.
There are duplicate fruits in my data frame. I need to delete the duplicates based on the following logic. If there is a duplicate fruit and Peter and Ashley both eat it, then Peter's row is kept and Ashley's row is deleted. If there is a duplicate fruit and Peter doesn't eat it and Ashley eats it, then Peter's row is deleted and Ashley's row remains. If there is a duplicate fruit and Peter doesn't eat it and Ashley doesn't eat it, then both rows are deleted.
采用这种逻辑,数据帧应输出为:
With this logic the data frame should output like:
Fruits Person Eat
Banana Peter Yes
Strawberry Ashley Yes
Cherry Peter Yes
Grape Ashley Yes
Pear Peter Yes
我不确定如何在这些条件下遍历熊猫数据框以删除重复项.通常,对于第一个条件,我会执行以下操作:
I'm not sure how to iterate through a pandas data frame with these conditions to delete duplicates. Generally, for the first condition I would do something like this:
data = [
{
"fruit": "Apple",
"person": "Ashley",
"eats": True
},
{
"fruit": "Apple",
"person": "Peter",
"eats": True
}
]
eats = dict()
for i, row in enumerate(data):
fruit = row["fruit"]
person = row["person"]
does_eat = row["eats"]
# mark whether person eats fruit
if not eats.get(person):
eats[person] = dict()
# if person does eat, record row number for later deletion if needed if does_eat:
eats[person][fruit] = i
# dedup
if person == "Peter" and eats.get("Peter") and eats["Peter"].get(fruit):
data.pop(eats["Ashley"][fruit])
elif person == "Ashley" and eats.get("Peter") and eats["Peter"].get(fruit):
data.pop(i)
任何有关如何使用我的数据框执行此操作的帮助/提示,将不胜感激.
Any help/tips on how to do this with my data frame would be very appreciated.
推荐答案
尝试一下:
df1 = (df[df.Eat.eq('Yes')].sort_values('Person')
.drop_duplicates(subset='Fruits', keep='last'))
Out[14]:
Fruits Person Eat
3 Strawberry Ashley Yes
7 Grape Ashley Yes
0 Banana Peter Yes
4 Cherry Peter Yes
9 Pear Peter Yes
这篇关于根据数据框中的条件消除重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!