问题描述
我有一个包含三列的数据框df1:
I have a dataframe df1 that contains three columns:
No. Start Time End Time
1 07/28/15 08:03 AM 07/28/15 08:09 AM
2 07/28/15 08:06 AM 07/28/15 08:12 AM
开始和结束时间代表特定作业的开始和结束时间.我想构建一个新的数据框,该数据框统计特定日期特定时间的活动作业数.像这样:
The start and end time represents the start and endtime of a certain job.I want to construct a new dataframe that counts the number of active jobs at a certain time at a specific day. Like this:
Hours Number of tasks
0:00
0:01
..
..
11:59
此数据框应在一天中的每一分钟显示活动的作业数.从8:03开始到8:09结束的工作应计入以下时间:(因为它在8:09结束并且在8:09不再活动)
This dataframe should display for every minute of the day how many jobs are active. A job that starts at 8:03 and ends at 8:09 should be counted for the following times: (Because it ends at 8:09 and is not active anymore at 8:09)
8:03
8:04
8:05
8:06
8:07
8:08
我应该如何以一种简单的方式做到这一点?
How should I do this in a simple way?
推荐答案
不是熊猫解决方案,但您可以循环和过滤.
基于小时的快速示例:
Not a pandas solution, but you could loop and filter.
Quick example base on the hour:
import datetime
jobs = [
(datetime.datetime(15, 7, 28, 8, 3), datetime.datetime(15, 7, 28, 8, 9)),
(datetime.datetime(15, 7, 28, 8, 3), datetime.datetime(15, 7, 28, 8, 58)),
(datetime.datetime(15, 7, 28, 8, 3), datetime.datetime(15, 7, 28, 10, 3)),
(datetime.datetime(15, 7, 28, 8, 3), datetime.datetime(15, 7, 28, 9, 3)),
(datetime.datetime(15, 7, 28, 10, 3), datetime.datetime(15, 7, 28, 8, 3)),
]
data = {'hours': [], 'active_jobs': []}
for hour in range(24):
current__active_jobs = 0
for job in jobs:
if job[0].hour == hour:
current__active_jobs += 1
elif job[0].hour < hour and job[1].hour >= hour:
current__active_jobs += 1
data['hour'].append(hour)
data['active_jobs'].append(current__active_jobs)
print DataFrame(data)
这篇关于计数是否:作业处于特定时间间隔内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!