本文介绍了使用Python 3.x在 pandas 中使用零和常量值扩展/填充时间序列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在扩展时间序列数据时遇到问题.我有以下数据框:

i have a problem with extending my time series data. I have following dataframe:

date_first = df1['date'].min()  # is 2016-08-08
date_last = df1['date'].max()  # is 2016-08-20

>>> df1
         date         customer     qty
149481   2016-08-08   A            400
161933   2016-08-10   A            200
167172   2016-08-13   B            900
170296   2016-08-15   A            300
178221   2016-08-20   B            150

现在我正在重新索引框架并获取以下框架:

Now i am reindexing the frame and get following frame:

df1.set_index('date', inplace=True)

>>> df1
             customer     qty
date
2016-08-08   A            400
2016-08-10   A            200
2016-08-13   B            900
2016-08-15   A            300
2016-08-20   B            150

现在,我正尝试将最早的日期和最新的日期扩展到每个客户的时间序列数据,如下所示:

Now i am trying to extend my time series data for every single customer by the earliest date and latest date like following:

ix = pd.DataFrame({on_column: pd.Series([date_first, date_last]), 'qty': 0})
result = df1.reindex(ix)

这没有给我我期望的结果,我希望它看起来像下面的框架:

This does not give me my expected result, which i want to look like following frame:

    >>> df1
    date         customer     qty
0   2016-08-08   A            400
1   2016-08-08   B            0
2   2016-08-09   A            0
3   2016-08-09   B            0
4   2016-08-10   A            200
5   2016-08-10   B            0
...
24  2016-08-20   A            0
25  2016-08-20   B            150

推荐答案

使用 MultiIndex.from_product 用于 重新索引 由MultiIndex docs/stable/generation/pandas.DataFrame.set_index.html"rel =" nofollow noreferrer> set_index 的两列:

Use MultiIndex.from_product for reindex by original MultiIndex created by set_index by both columns:

date_first = df1['date'].min()
date_last = df1['date'].max()

mux = pd.MultiIndex.from_product([pd.date_range(date_first, date_last, freq='d'),
                                  df1['customer'].unique()], names=['date','customer'])
print (mux)
result = df1.set_index(['date', 'customer']).reindex(mux, fill_value=0).reset_index()
print (result)
         date customer  qty
0  2016-08-08        A  400
1  2016-08-08        B    0
2  2016-08-09        A    0
3  2016-08-09        B    0
4  2016-08-10        A  200
5  2016-08-10        B    0
6  2016-08-11        A    0
7  2016-08-11        B    0
8  2016-08-12        A    0
9  2016-08-12        B    0
10 2016-08-13        A    0
11 2016-08-13        B  900
12 2016-08-14        A    0
13 2016-08-14        B    0
14 2016-08-15        A  300
15 2016-08-15        B    0
16 2016-08-16        A    0
17 2016-08-16        B    0
18 2016-08-17        A    0
19 2016-08-17        B    0
20 2016-08-18        A    0
21 2016-08-18        B    0
22 2016-08-19        A    0
23 2016-08-19        B    0
24 2016-08-20        A    0
25 2016-08-20        B  150

这篇关于使用Python 3.x在 pandas 中使用零和常量值扩展/填充时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 09:25