对于以下数据框中的每个组city
和district
,我想使用price
的2019-03
值作为基本值,计算与2019-06
中的值。
city district date price
0 a c 2019-01 9.99
1 a c 2019-02 10.66
2 a c 2019-03 10.56
3 a c 2019-04 10.06
4 a c 2019-05 10.69
5 a c 2019-06 10.77
6 a c 2019-07 10.67
7 a c 2019-08 10.51
8 a c 2019-09 10.28
9 a c 2019-10 10.05
10 a c 2019-11 9.72
11 a c 2019-12 9.98
12 b d 2019-01 6.32
13 b d 2019-02 6.32
14 b d 2019-03 6.32
15 b d 2019-04 6.32
16 b d 2019-05 6.32
17 b d 2019-06 6.00
18 b d 2019-07 6.00
19 b d 2019-08 6.00
20 b d 2019-09 6.00
21 b d 2019-10 6.00
22 b d 2019-11 6.00
23 b d 2019-12 5.65
我怎样才能得到像这样的预期结果?谢谢。
city district date price pct
0 a c 2019-01 9.99 NaN
1 a c 2019-02 10.66 NaN
2 a c 2019-03 10.56 NaN
3 a c 2019-04 10.06 NaN
4 a c 2019-05 10.69 NaN
5 a c 2019-06 10.77 0.019886
6 a c 2019-07 10.67 NaN
7 a c 2019-08 10.51 NaN
8 a c 2019-09 10.28 NaN
9 a c 2019-10 10.05 NaN
10 a c 2019-11 9.72 NaN
11 a c 2019-12 9.98 -0.054924
12 b d 2019-01 6.32 NaN
13 b d 2019-02 6.32 NaN
14 b d 2019-03 6.32 NaN
15 b d 2019-04 6.32 NaN
16 b d 2019-05 6.32 NaN
17 b d 2019-06 6.00 -0.050633
18 b d 2019-07 6.00 NaN
19 b d 2019-08 6.00 NaN
20 b d 2019-09 6.00 NaN
21 b d 2019-10 6.00 NaN
22 b d 2019-11 6.00 NaN
23 b d 2019-12 5.65 -0.106013
只是尝试使用
2019-12
,显然我并没有得到我的需要。当前代码的输出:
city district date price pct1 pct2
0 a c 2019-01 9.99 NaN NaN
1 a c 2019-02 10.66 NaN NaN
2 a c 2019-03 10.56 NaN NaN
3 a c 2019-04 10.06 NaN NaN
4 a c 2019-05 10.69 NaN NaN
5 a c 2019-06 10.77 0.078078 NaN
6 a c 2019-07 10.67 0.000938 NaN
7 a c 2019-08 10.51 -0.004735 NaN
8 a c 2019-09 10.28 0.021869 NaN
9 a c 2019-10 10.05 -0.059869 NaN
10 a c 2019-11 9.72 -0.097493 NaN
11 a c 2019-12 9.98 -0.064667 -0.001001
12 b d 2019-01 6.32 NaN NaN
13 b d 2019-02 6.32 NaN NaN
14 b d 2019-03 6.32 NaN NaN
15 b d 2019-04 6.32 NaN NaN
16 b d 2019-05 6.32 NaN NaN
17 b d 2019-06 6.00 -0.050633 NaN
18 b d 2019-07 6.00 -0.050633 NaN
19 b d 2019-08 6.00 -0.050633 NaN
20 b d 2019-09 6.00 -0.050633 NaN
21 b d 2019-10 6.00 -0.050633 NaN
22 b d 2019-11 6.00 0.000000 NaN
23 b d 2019-12 5.65 -0.058333 -0.106013
最佳答案
您可以在不使用isin
的情况下使用groupby
,对于第一个值的划分,请使用transform
:
m = df["date"].isin(['2019-01', '2019-06', '2019-12'])
s = df[m].groupby(["city","district"])['price'].transform('first')
df.loc[m, 'pct1'] = df.loc[m, 'price'].div(s).sub(1)
print (df)
city district date price pct1
0 a c 2019-01 9.99 0.000000
1 a c 2019-02 10.66 NaN
2 a c 2019-03 10.56 NaN
3 a c 2019-04 10.06 NaN
4 a c 2019-05 10.69 NaN
5 a c 2019-06 10.77 0.078078
6 a c 2019-07 10.67 NaN
7 a c 2019-08 10.51 NaN
8 a c 2019-09 10.28 NaN
9 a c 2019-10 10.05 NaN
10 a c 2019-11 9.72 NaN
11 a c 2019-12 9.98 -0.001001
12 b d 2019-01 6.32 0.000000
13 b d 2019-02 6.32 NaN
14 b d 2019-03 6.32 NaN
15 b d 2019-04 6.32 NaN
16 b d 2019-05 6.32 NaN
17 b d 2019-06 6.00 -0.050633
18 b d 2019-07 6.00 NaN
19 b d 2019-08 6.00 NaN
20 b d 2019-09 6.00 NaN
21 b d 2019-10 6.00 NaN
22 b d 2019-11 6.00 NaN
23 b d 2019-12 5.65 -0.106013