问题描述
我有一个 DataFrame
,它由 date
s,其他列和一个数值组成,其中一些值组合在其他列可能会丢失,我想从以前的日期
开始填充它们。
I have a DataFrame
consisting of date
s, other columns and a numerical value, where some value combinations in "other columns" could be missing, and I want to populate them from previous date
s.
示例。假设 DataFrame
如下所示。您可以在 2016-01-01
上看到,我们有(LN,A)
,(LN,B)
,(NY,A)
和(NY,B)
在(位置,范围)
列上。
Example. Say the DataFrame
is like below. You can see on 2016-01-01
, we have data for (LN, A)
, (LN, B)
, (NY, A)
and (NY, B)
on columns (location, band)
.
date location band value
0 2016-01-01 LN A 10.0
1 2016-01-01 LN B 5.0
2 2016-01-01 NY A 9.0
3 2016-01-01 NY B 6.0
4 2016-01-02 LN A 11.0
5 2016-01-02 NY B 7.0
6 2016-01-03 NY A 10.0
然后您注意到 2016-01-02
,我们只有(LN,A)
和(NY,B)
,但(LN,B)
和(NY,A)
丢失。同样,在 2016-01-03
上,仅(纽约州,美国)
可用;所有其他三个组合都丢失了。
Then you notice on 2016-01-02
, we only have (LN, A)
and (NY, B)
, but (LN, B)
and (NY, A)
are missing. Again, on 2016-01-03
, only (NY, A)
is available; all other three combinations are missing.
我要做的是填充其前身中每个日期的丢失组合。比如说 2016-01-02
,我想再添加两行,从 2016-01-01 :
。 (LN,B,5.0)
和(NY,A,9.0)
$ c>(位置,范围,价值) 2016-01-03
也是如此。为了使整个事情像下面这样:
What I want to do is to populate the missing combinations of each date from its predecessor. Say for 2016-01-02
, I would like to add two more rows, "rolled over" from 2016-01-01
: (LN, B, 5.0)
and (NY, A, 9.0)
for columns (location, band, value)
. Same for 2016-01-03
. So as to make the whole thing like below:
date location band value
0 2016-01-01 LN A 10.0
1 2016-01-01 LN B 5.0
2 2016-01-01 NY A 9.0
3 2016-01-01 NY B 6.0
4 2016-01-02 LN A 11.0
5 2016-01-02 NY B 7.0
6 2016-01-03 NY A 10.0
7 2016-01-02 LN B 5.0
8 2016-01-02 NY A 9.0
9 2016-01-03 LN A 11.0
10 2016-01-03 LN B 5.0
11 2016-01-03 NY B 7.0
注意第7-11行分别从第1、2、4、7和5行填充。顺序不是很重要,因为如果我需要的所有数据都存在,我总是可以在事后进行排序。
Note rows 7-11 are populated from rows 1, 2, 4, 7 and 5, respectively. The order is not really important as I can always sort afterwards if all the data I need is present.
有人需要帮助吗?非常感谢!
Anyone to help? Thanks a lot!
推荐答案
您可以使用 unstack
/ stack
方法获取所有缺失值,然后进行正向填充:
You can use a unstack
/stack
method to get all missing values, followed by a forward fill:
# Use unstack/stack to add missing locations.
df = df.set_index(['date', 'location', 'band']) \
.unstack(level=['location', 'band']) \
.stack(level=['location', 'band'], dropna=False)
# Forward fill NaN values within ['location', 'band'] groups.
df = df.groupby(level=['location', 'band']).ffill().reset_index()
或者您可以直接构建包含所有组合的 MultiIndex
:
Or you can directly build a MultiIndex
containing all combinations:
# Build the full MultiIndex, set the partial MultiIndex, and reindex.
levels = ['date', 'location', 'band']
full_idx = pd.MultiIndex.from_product([df[col].unique() for col in levels], names=levels)
df = df.set_index(levels).reindex(full_idx)
# Forward fill NaN values within ['location', 'band'] groups.
df = df.groupby(level=['location', 'band']).ffill().reset_index()
任一方法的结果输出:
date location band value
0 2016-01-01 LN A 10.0
1 2016-01-01 LN B 5.0
2 2016-01-01 NY A 9.0
3 2016-01-01 NY B 6.0
4 2016-01-02 LN A 11.0
5 2016-01-02 LN B 5.0
6 2016-01-02 NY A 9.0
7 2016-01-02 NY B 7.0
8 2016-01-03 LN A 11.0
9 2016-01-03 LN B 5.0
10 2016-01-03 NY A 10.0
11 2016-01-03 NY B 7.0
这篇关于Pandas DataFrame插入/填充以前日期的缺失行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!