我每小时都有有关自行车租赁需求和天气的数据。我想分别绘制每个小时的平均需求,包括好天气和坏天气。

当我绘制给定时间的平均需求(不考虑天气)时,我所做的是计算给定时间的租金总需求,然后除以总小时数:

hour_count = np.bincount(hour)
for i in range(number_of_observations):
    hour_sums[hour[i]] = hour_sums[hour[i]] + rentals[i]

av_rentals = [x/y for x,y in zip(hour_sums,hour_count)]


现在,我想这样做,但要分别考虑好天气和坏天气。累计和很容易,我只是添加了一个“ if”子句。我不知道如何计算好坏天气的小时数。我宁愿避免像sum这样的大循环……任何与bincount相同但带有子句的函数?就像是:

good_weather_hour_count = np.bincount(hour, weather == 1 or weather == 2)


有任何想法吗?
PS。也许有人知道如何在给定的小时内不加循环地求租?我尝试使用2d直方图进行操作,但是没有用。

label_sums = np.histogram2d(hour, rentals, bins=24)[0]

最佳答案

np.bincount has a weights parameter,您可以用它来计算小时数的二进制数,该小时数由租金数加权。例如,

In [39]: np.bincount([1,2,3,1], weights=[20,10,40,10])
Out[39]: array([  0.,  30.,  10.,  40.])


因此,您可以替换for-loop

for i in range(number_of_observations):
    hour_sums[hour[i]] = hour_sums[hour[i]] + rentals[i]




hour_sums = np.bincount(hour, weights=rentals, minlength=24)




要处理好/坏天气,可以屏蔽hourrentals数据以仅选择适用的数据子集:

mask = (weather == w)
masked_hour = hour[mask]
masked_rentals = rentals[mask]


然后对masked_hourmasked_rentals进行计算:

import numpy as np

np.random.seed(2016)
N = 2
hour = np.tile(np.arange(24), N)
rentals = np.random.randint(10, size=(len(hour),))
# say, weather=1 means good weather, 2 means bad weather
weather = np.random.randint(1, 3, size=(len(hour),))

average_rentals = dict()
for kind, w in zip(['good', 'bad', 'all'], [1, 2, None]):
    if w is None:
        mask = slice(None)
    else:
        mask = (weather == w)
    masked_hour = hour[mask]
    masked_rentals = rentals[mask]
    total_rentals = np.bincount(masked_hour, weights=masked_rentals, minlength=24)
    total_hours = np.bincount(masked_hour, minlength=24)
    average_rentals[kind] = (total_rentals / total_hours)

for kind, result in average_rentals.items():
    print('\n{}: {}'.format(kind, result))


产量

bad: [ 4.   6.   2.   5.5  nan  4.   4.   8.   nan  3.   nan  2.5  4.   nan  9.
  nan  3.   5.5  8.   nan  8.   5.   9.   4. ]

good: [ 3.   nan  4.   nan  8.   4.   nan  7.   5.5  2.   4.   nan  nan  0.5  9.
  0.5  nan  nan  5.   7.   1.   7.   8.   0. ]

all: [ 3.5  6.   3.   5.5  8.   4.   4.   7.5  5.5  2.5  4.   2.5  4.   0.5  9.
  0.5  3.   5.5  6.5  7.   4.5  6.   8.5  2. ]

09-25 19:39