根据日期选择行，而不是基于日期的列值

本文介绍了根据日期选择行，而不是基于日期的列值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我通常对Pandas和这个领域都不熟悉，因此遇到一个挑战，即需要一个类似以下数据框的数据框，需要在情绪"列中搜索一个名为过敏"的值，然后从结果数据中选择包含过敏"的行，以及基于日期的在此之前的行.因此，在此示例中，它包含了过敏"之前2天之内的行.

I am new to Pandas and this space in general and have a challenge where I have a dataframe like the following where I need to search on a value called 'allergies' in column 'mood', and from the resulting data, select the rows containing 'allergies', and the rows prior to this based on date. So in this example it includes the rows up to 2 day prior to 'allergies'.

我的数据框看起来像:-

My dataframe looks like:-

id    food     date        mood
id 1  nuts     2018-11-12  high
id 2  potatoes 2018-11-13  low
id 3  fish     2018-11-14  high
id 4  bread    2018-11-14  high
id 5  fish     2018-11-14  high
id 6  nuts     2018-11-14  high
id 7  fish     2018-11-15  allergies
id 8  beer     2018-11-16  low
id 9  bread    2018-11-17  high
id 10 fish     2018-11-18  high
id 11 pasta    2018-11-19  allergies

我想要实现的是将提供类似以下内容的代码:-

What I would like to achieve is code that will deliver something like: -

id    food     date        mood
id 2  potatoes 2018-11-13  low
id 3  fish     2018-11-14  high
id 4  bread    2018-11-14  high
id 5  fish     2018-11-14  high
id 6  nuts     2018-11-14  high
id 7  fish     2018-11-15  allergies
id 9  bread    2018-11-17  high
id 10 fish     2018-11-18  high
id 11 pasta    2018-11-19  allergies

因此，当情绪=过敏"时，返回前两天的食物"条目.

So returning the 2 prior days 'food' entries when 'mood=allergies'.

我希望最终会导致一种结果，即普通食品被理解为鱼"，并且该信息会反馈给用户，例如:-

I hope to eventually lead to an outcome where the common food item is understood to be 'fish' and this info presented back to the user such as: -

"Did you realize that when you eat fish you get allergies"

有人可以建议我使用熊猫的正确方法吗?

Could someone please advise me on the correct approach to this using Pandas?

谢谢

micdoher

推荐答案

通过allergies比较创建帮助器系列，更改顺序并通过 Series.cumsum ，然后传递给 GroupBy.cumcount ，对于第二和第三列，按isin进行比较:

Create helper Series with compare by allergies, change order and use cumulative sum by Series.cumsum, then it pass to GroupBy.cumcount and for second and third column compare by isin:

s = df['mood'].eq('allergies').iloc[::-1].cumsum()
df = df[df.groupby(s).cumcount(ascending=False).isin([1,2])]
print (df)
     id      food        date  mood
1  id 2  potatoes  2018-11-13   low
2  id 3      fish  2018-11-14  high
4  id 5     bread  2018-11-16  high
5  id 6      fish  2018-11-17  high

详细信息:

print (s)
6    1
5    1
4    1
3    2
2    2
1    2
0    2
Name: mood, dtype: int32

另一种解决方案:

s = df['mood'].eq('allergies').iloc[::-1].cumsum().sort_index()
df = df[(df.groupby(s).cumcount(ascending=False) < 3) & s.duplicated(keep='last')]
print (df)
     id      food        date  mood
1  id 2  potatoes  2018-11-13   low
2  id 3      fish  2018-11-14  high
4  id 5     bread  2018-11-16  high
5  id 6      fish  2018-11-17  high

这篇关于根据日期选择行，而不是基于日期的列值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！