问题描述
我还是一个新手,仍在尝试弄清楚Pandas-拆分/应用/合并已开始变得有意义,但我还没有.
I'm a newbie still trying to figure out Pandas - split/apply/combine is just starting to make sense but I'm not there yet.
我正在尝试获取前几年(2013-2015年)同一天的历史最高价和最低价销售电话,然后在另一列中告诉我2016年销售价比之前的最高价或更低的天数比之前的低点和值是多少.
I'm trying to get the historical high and low sales call numbers from the same day in previous years (2013-2015), and then have another column that tells me what days the 2016 calls were higher than previous highs or lower than previous lows and what the values were.
到目前为止,这是我的尝试:
Here's my attempt so far:
df = pd.read_csv('filename.csv')
df['Date'] = pd.to_datetime(df['Date'])
df = df[(df['Date']>= '01/01/2013') & (df['Date'] <= '12/31/2015')]
df['Month'] = df.Date.dt.month
df['Day'] = df.Date.dt.day
dfMAX = df[df['Element'] == "MAX"]
dfMAX = (dfMAX.groupby([dfMAX.Date.dt.month, dfMAX.Date.dt.day, 'Element'])
[['Data_Value']]
.agg(['max']))
这使我每天获得最大值,我知道我可以为最小值重复此操作.我正在努力了解如何将其与多索引重新组合在一起,然后如何获得高于或低于已记录的最小/最大的2016年值.
This gets me the max values per day and I know I could repeat this for the min values. I'm struggling to know how to put it back together with the multi-index and then how to get the 2016 values that were either higher or lower than the min/max already recorded.
我不担心leap年的数据,为此,可以删除/忽略2月29日.
I'm not worried about leap year data, Feb 29th can be dropped/ignored for this.
输出将是这样的:
Day of Year Min Max 2016
1/1 50 1900
1/2 23 2100 2102
1/3 90 1800 85
1/4 89 1750
1/5 50 2309 45
1/6 44 5600 5649
我在此处发布了一个csv文件: https://drive.google.com/open?id=0B4xdnV0LFZI1dUE3ZFBxdWFQOGc
I've posted a csv file here:https://drive.google.com/open?id=0B4xdnV0LFZI1dUE3ZFBxdWFQOGc
感谢您的帮助,我
推荐答案
我将使用以下步骤解决此问题:
I would tackle this using the follow steps:
- 确保日期是datetime列dtype.然后,创建一个新列使用
.dt.dayofyear
在名为Day_Of_Year的数据框中显示. - 将您的数据框分为两个数据框,一个是2013年到2015年,另一个是2016年.
- 采用2013年到2015年的数据框,并在
groupby
和.agg
中使用最小值和最大值以获取2013年到2015年的最小值和最大值,以创建一个数据框每日记录. - 接下来,将此新的每日记录数据框与2016数据框合并使用
pd.merge
,您可以将2016年至今的索引设置为使用left_index=True
和right_index=True
合并索引. - 最后,我然后将使用布尔索引来仅选择那些记录2016年值超出最小值或最大值的地方合并数据框中的所有列.
- Make sure date is a datetime column dtype. Then, create a new columnin your dataframe called Day_Of_Year using
.dt.dayofyear
. - Split your dataframe into two dataframes, one 2013 thru 2015 and2016.
- Take your 2013 thru 2015 dataframe and use
groupby
with.agg
ofmin and max to get your min and max values for 2013 thru 2015 to create adataframe daily records. - Next merge this new daily records dataframe with 2016 dataframeusing
pd.merge
, you can set the index on 2016 to date and themerge on indexes usingleft_index=True
andright_index=True
. - Lastly, I would then using boolean indexing to select only thoserecords where the 2016 value is outside of the min or the maxcolumns in your merged dataframe.
您应该在数据中得到以下信息:
You should get something like this with your data:
min max Date Element Value
DayofYear
1 545 1812 2016-01-01 MAX 1887
3 108 1815 2016-01-03 MAX 1906
4 496 1618 2016-01-04 MAX 1701
6 455 1864 2016-01-06 MIN 169
8 511 1771 2016-01-08 MIN 232
请在下面查看我的扰流器代码.
See my spoiler code hover below.
这篇关于比较 pandas 在不同年份的同一天的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!