问题描述
我通常对Pandas和这个领域都不熟悉,因此遇到一个挑战,即需要一个类似以下数据框的数据框,需要在情绪"列中搜索一个名为过敏"的值,然后从结果数据中选择包含过敏"的行,以及基于日期的在此之前的行.因此,在此示例中,它包含了过敏"之前2天之内的行.
I am new to Pandas and this space in general and have a challenge where I have a dataframe like the following where I need to search on a value called 'allergies' in column 'mood', and from the resulting data, select the rows containing 'allergies', and the rows prior to this based on date. So in this example it includes the rows up to 2 day prior to 'allergies'.
我的数据框看起来像:-
My dataframe looks like:-
id food date mood
id 1 nuts 2018-11-12 high
id 2 potatoes 2018-11-13 low
id 3 fish 2018-11-14 high
id 4 bread 2018-11-14 high
id 5 fish 2018-11-14 high
id 6 nuts 2018-11-14 high
id 7 fish 2018-11-15 allergies
id 8 beer 2018-11-16 low
id 9 bread 2018-11-17 high
id 10 fish 2018-11-18 high
id 11 pasta 2018-11-19 allergies
我想要实现的是将提供类似以下内容的代码:-
What I would like to achieve is code that will deliver something like: -
id food date mood
id 2 potatoes 2018-11-13 low
id 3 fish 2018-11-14 high
id 4 bread 2018-11-14 high
id 5 fish 2018-11-14 high
id 6 nuts 2018-11-14 high
id 7 fish 2018-11-15 allergies
id 9 bread 2018-11-17 high
id 10 fish 2018-11-18 high
id 11 pasta 2018-11-19 allergies
因此,当情绪=过敏"时,返回前两天的食物"条目.
So returning the 2 prior days 'food' entries when 'mood=allergies'.
我希望最终会导致一种结果,即普通食品被理解为鱼",并且该信息会反馈给用户,例如:-
I hope to eventually lead to an outcome where the common food item is understood to be 'fish' and this info presented back to the user such as: -
"Did you realize that when you eat fish you get allergies"
有人可以建议我使用熊猫的正确方法吗?
Could someone please advise me on the correct approach to this using Pandas?
谢谢
micdoher
推荐答案
通过allergies
比较创建帮助器系列,更改顺序并通过 Series.cumsum
,然后传递给 GroupBy.cumcount
,对于第二和第三列,按isin
进行比较:
Create helper Series with compare by allergies
, change order and use cumulative sum by Series.cumsum
, then it pass to GroupBy.cumcount
and for second and third column compare by isin
:
s = df['mood'].eq('allergies').iloc[::-1].cumsum()
df = df[df.groupby(s).cumcount(ascending=False).isin([1,2])]
print (df)
id food date mood
1 id 2 potatoes 2018-11-13 low
2 id 3 fish 2018-11-14 high
4 id 5 bread 2018-11-16 high
5 id 6 fish 2018-11-17 high
详细信息:
print (s)
6 1
5 1
4 1
3 2
2 2
1 2
0 2
Name: mood, dtype: int32
另一种解决方案:
s = df['mood'].eq('allergies').iloc[::-1].cumsum().sort_index()
df = df[(df.groupby(s).cumcount(ascending=False) < 3) & s.duplicated(keep='last')]
print (df)
id food date mood
1 id 2 potatoes 2018-11-13 low
2 id 3 fish 2018-11-14 high
4 id 5 bread 2018-11-16 high
5 id 6 fish 2018-11-17 high
这篇关于根据日期选择行,而不是基于日期的列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!