本文介绍了使用Python每小时进行一次频率计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的每小时csv数据,每天都经过数百天的排序:

I have this Hourly csv datas sorted like this day by day for hundreds days:

2011.05.16,00:00,1.40893
2011.05.16,01:00,1.40760
2011.05.16,02:00,1.40750
2011.05.16,03:00,1.40649

2011.05.16,00:00,1.40893
2011.05.16,01:00,1.40760
2011.05.16,02:00,1.40750
2011.05.16,03:00,1.40649

我要计算每天已设置的每日最大值的次数,因此,如果在00:00时我的最大值为2011.05.16,则将1加到00:00,然后为此,我使用循环以这种方式对小时进行计数(例如索引):

I want to make a count of how many times per hour the daily maximum value has been set, so if on 00:00 i had the maximum value of 2011.05.16 day i add 1 to 00:00 and so on. To do this i used a loop to count hours like indexes in this way:

def graph():
Date, Time,  High = np.genfromtxt(myPath, delimiter=",",
                                  unpack = True,  converters={0:date_converter})
numList = [""] * 24
index=0
hour=0
count = [0] * 24

for eachHour in Time:
    numList[hour] += str(High[index])
    index += 1
    hour +=1

    if hour == 24:
        higher = (numList.index(max(numList)))
        count[higher] += 1
        hour = 0
        numList = [""] * 24

问题是我的数据中经常有一个缺口,缺少一些小时,但是循环无法识别它,并继续在下一个小时索引中放入值.我到处搜索过,但是我是编程的新手,这是我的第一个复杂"工作,因此我需要针对我的案例更具体的答案以了解其工作原理. 那么您如何像解释的那样进行每小时频率计数?最终结果应类似于:

The problem is that in my datas often there's a gap with some hours missing, but the loop can't recognize it and continue putting values in the next hour index. I've searched everywhere, but i'm new to programming and this is my first "complex" work so i need more specific answers to my case for understand how it works. So how do you make an hourly frequency count like explained?The final result should be like:

00:00 n time max of the day
01:00 n time max of the day
02:00 n time max of the day
etc

推荐答案

首先在csv中阅读:

In [11]: df = pd.read_csv('foo.csv', sep=',', header=None, parse_dates=[[0, 1]])

In [12]: df.columns = ['date', 'val']

In [13]: df.set_index('date', inplace=True)

In [14]: df
Out[14]:
                         val
date
2011-05-16 00:00:00  1.40893
2011-05-16 01:00:00  1.40760
2011-05-16 02:00:00  1.40750
2011-05-16 03:00:00  1.40649

使用重采样获取每天最多的时间:

Use resample to get each days maximum:

In [15]: day_max = df.resample('D', how='max')

检查每个值是否为最大日期:

Check whether each value is the day max:

In [16]: df['is_day_max'] = day_max.lookup(df.index.normalize(), len(df) * ['val']) == df.val

In [17]: df
Out[17]:
                         val is_day_max
date
2011-05-16 00:00:00  1.40893       True
2011-05-16 01:00:00  1.40760      False
2011-05-16 02:00:00  1.40750      False
2011-05-16 03:00:00  1.40649      False

然后在每个小时内对它们进行求和:

And then sum these over each hour:

In [18]: df.groupby(df.index.time)['is_day_max'].sum()
Out[18]:
00:00:00    1
01:00:00    0
02:00:00    0
03:00:00    0
Name: is_day_max, dtype: float64

这篇关于使用Python每小时进行一次频率计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 09:44
查看更多