问题描述
我有这样的每小时csv数据,每天都经过数百天的排序:
I have this Hourly csv datas sorted like this day by day for hundreds days:
2011.05.16,00:00,1.40893
2011.05.16,01:00,1.40760
2011.05.16,02:00,1.40750
2011.05.16,03:00,1.40649
2011.05.16,00:00,1.40893
2011.05.16,01:00,1.40760
2011.05.16,02:00,1.40750
2011.05.16,03:00,1.40649
我要计算每天已设置的每日最大值的次数,因此,如果在00:00时我的最大值为2011.05.16,则将1加到00:00,然后为此,我使用循环以这种方式对小时进行计数(例如索引):
I want to make a count of how many times per hour the daily maximum value has been set, so if on 00:00 i had the maximum value of 2011.05.16 day i add 1 to 00:00 and so on. To do this i used a loop to count hours like indexes in this way:
def graph():
Date, Time, High = np.genfromtxt(myPath, delimiter=",",
unpack = True, converters={0:date_converter})
numList = [""] * 24
index=0
hour=0
count = [0] * 24
for eachHour in Time:
numList[hour] += str(High[index])
index += 1
hour +=1
if hour == 24:
higher = (numList.index(max(numList)))
count[higher] += 1
hour = 0
numList = [""] * 24
问题是我的数据中经常有一个缺口,缺少一些小时,但是循环无法识别它,并继续在下一个小时索引中放入值.我到处搜索过,但是我是编程的新手,这是我的第一个复杂"工作,因此我需要针对我的案例更具体的答案以了解其工作原理. 那么您如何像解释的那样进行每小时频率计数?最终结果应类似于:
The problem is that in my datas often there's a gap with some hours missing, but the loop can't recognize it and continue putting values in the next hour index. I've searched everywhere, but i'm new to programming and this is my first "complex" work so i need more specific answers to my case for understand how it works. So how do you make an hourly frequency count like explained?The final result should be like:
00:00 n time max of the day
01:00 n time max of the day
02:00 n time max of the day
etc
推荐答案
首先在csv中阅读:
In [11]: df = pd.read_csv('foo.csv', sep=',', header=None, parse_dates=[[0, 1]])
In [12]: df.columns = ['date', 'val']
In [13]: df.set_index('date', inplace=True)
In [14]: df
Out[14]:
val
date
2011-05-16 00:00:00 1.40893
2011-05-16 01:00:00 1.40760
2011-05-16 02:00:00 1.40750
2011-05-16 03:00:00 1.40649
使用重采样获取每天最多的时间:
Use resample to get each days maximum:
In [15]: day_max = df.resample('D', how='max')
检查每个值是否为最大日期:
Check whether each value is the day max:
In [16]: df['is_day_max'] = day_max.lookup(df.index.normalize(), len(df) * ['val']) == df.val
In [17]: df
Out[17]:
val is_day_max
date
2011-05-16 00:00:00 1.40893 True
2011-05-16 01:00:00 1.40760 False
2011-05-16 02:00:00 1.40750 False
2011-05-16 03:00:00 1.40649 False
然后在每个小时内对它们进行求和:
And then sum these over each hour:
In [18]: df.groupby(df.index.time)['is_day_max'].sum()
Out[18]:
00:00:00 1
01:00:00 0
02:00:00 0
03:00:00 0
Name: is_day_max, dtype: float64
这篇关于使用Python每小时进行一次频率计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!