我下面总结了许多csv文件中的数据。

+----------+-----+------+------+------+
|   Date   | Loc | Hour | Rain | Wind |
+----------+-----+------+------+------+
| 1-Sep-19 |   1 |    1 |  184 |   65 |
| 1-Sep-19 |   1 |    3 |  126 |   64 |
| 2-Sep-19 |   1 |    1 |  112 |   63 |
| 2-Sep-19 |   1 |    2 |  155 |    0 |
| 2-Sep-19 |   1 |    3 |  186 |   50 |
| 3-Sep-19 |   1 |    2 |  154 |   79 |
| 3-Sep-19 |   1 |    3 |  143 |   61 |
| 1-Sep-19 |   2 |    1 |  187 |   73 |
| 1-Sep-19 |   2 |    2 |  173 |   63 |
| 1-Sep-19 |   2 |    3 |  186 |   63 |
| 3-Sep-19 |   2 |    1 |  172 |   56 |
| 3-Sep-19 |   2 |    2 |  156 |   56 |
| 3-Sep-19 |   2 |    3 |  176 |   79 |
+----------+-----+------+------+------+


收集特定RainWindDayLocationHour详细信息的随机观测值。

请注意DayLocationHour中缺少的行。

我想用0值填充丢失的数据。我想要的最终输出如下。

+----------+-----+------+------+------+
|   Date   | Loc | Hour | Rain | Wind |
+----------+-----+------+------+------+
| 1-Sep-19 |   1 |    1 |  184 |   65 |
| 1-Sep-19 |   1 |    2 |    0 |    0 |
| 1-Sep-19 |   1 |    3 |  126 |   64 |
| 2-Sep-19 |   1 |    1 |  112 |   63 |
| 2-Sep-19 |   1 |    2 |  155 |    0 |
| 2-Sep-19 |   1 |    3 |  186 |   50 |
| 3-Sep-19 |   1 |    1 |    0 |    0 |
| 3-Sep-19 |   1 |    2 |  154 |   79 |
| 3-Sep-19 |   1 |    3 |  143 |   61 |
| 1-Sep-19 |   2 |    1 |  187 |   73 |
| 1-Sep-19 |   2 |    2 |  173 |   63 |
| 1-Sep-19 |   2 |    3 |  186 |   63 |
| 2-Sep-19 |   2 |    1 |    0 |    0 |
| 2-Sep-19 |   2 |    2 |    0 |    0 |
| 2-Sep-19 |   2 |    3 |    0 |    0 |
| 3-Sep-19 |   2 |    1 |  172 |   56 |
| 3-Sep-19 |   2 |    2 |  156 |   56 |
| 3-Sep-19 |   2 |    3 |  176 |   79 |
+----------+-----+------+------+------+


到目前为止,我已经尝试了以下代码

    import pandas as pd
    import numpy as np

    df = pd.read_csv('data.csv')

    d1 = date(2019, 9, 1)
    d2 = date(2019, 9, 3)
    delta = d2 - d1

    # below indexes were created to make the code more dynamic & scalable easily
    idx_date = [(d1 + timedelta(days=i)) for i in range(delta.days + 1)]
    idx_loc = np.arange(1,3)
    idx_hour = np.arange(1,4)

    cols =['Rain', 'Wind']

    df_filled = df.reindex(index=[idx_date,idx_loc,idx_hour], columns=cols, fill_value=0)


但是,我得到一个错误。如何解决这个问题

最佳答案

首先有必要创建MultiIndex,因此可以通过所有可能的值使用reindex

df = df.set_index(['Date','Loc','Hour'])
mux = pd.MultiIndex.from_product(df.index.levels, names=df.index.names)
df = df.reindex(mux, fill_value=0).reset_index()
print (df)
      Date  Loc  Hour  Rain  Wind
0   1Sep19    1     1   184    65
1   1Sep19    1     2     0     0
2   1Sep19    1     3   126    64
3   1Sep19    2     1   187    73
4   1Sep19    2     2   173    63
5   1Sep19    2     3   186    63
6   2Sep19    1     1   112    63
7   2Sep19    1     2   155     0
8   2Sep19    1     3   186    50
9   2Sep19    2     1     0     0
10  2Sep19    2     2     0     0
11  2Sep19    2     3     0     0
12  3Sep19    1     1     0     0
13  3Sep19    1     2   154    79
14  3Sep19    1     3   143    61
15  3Sep19    2     1   172    56
16  3Sep19    2     2   156    56
17  3Sep19    2     3   176    79


您的解决方案应更改:

df['Date'] = pd.to_datetime(df['Date'])

idx_date = pd.date_range('2019-09-01','2019-09-03')
idx_loc = np.arange(1,3)
idx_hour = np.arange(1,4)

cols =['Rain', 'Wind']
mux = pd.MultiIndex.from_product([idx_date,idx_loc,idx_hour],
                                 names=['Date','Loc','Hour'])
df_filled = (df.set_index(['Date','Loc','Hour'])
               .reindex(index=mux, columns=cols, fill_value=0)
               .reset_index())
print (df_filled)
         Date  Loc  Hour  Rain  Wind
0  2019-09-01    1     1   184    65
1  2019-09-01    1     2     0     0
2  2019-09-01    1     3   126    64
3  2019-09-01    2     1   187    73
4  2019-09-01    2     2   173    63
5  2019-09-01    2     3   186    63
6  2019-09-02    1     1   112    63
7  2019-09-02    1     2   155     0
8  2019-09-02    1     3   186    50
9  2019-09-02    2     1     0     0
10 2019-09-02    2     2     0     0
11 2019-09-02    2     3     0     0
12 2019-09-03    1     1     0     0
13 2019-09-03    1     2   154    79
14 2019-09-03    1     3   143    61
15 2019-09-03    2     1   172    56
16 2019-09-03    2     2   156    56
17 2019-09-03    2     3   176    79

关于python - Pandas 重新索引多列,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57883965/

10-12 16:38