我有一些数据,它们代表了许多不同站点上的时间结果。我想找到我的结果的四分位数细分,以及每个站点的最大和最小日期。
找到这些中的每一个都很容易:
#quartiles
q = df.groupby(['site_id', 'datum']).quantile([0.25,0.5,0.75])
#max and min vlaues
d_max = df.groupby(['site_id', 'datum']).max()
d_min = df.groupby(['site_id', 'datum']).min()
结果是多索引数据帧。如何将它们重新结合在一起,以获取site_id和datum每种组合的所有3个值?
一些样本数据:
from io import StringIO
import pandas as pd
TESTDATA=StringIO(u'''date site_id datum result
1968-01-10 RN004481 SWL 61.23
1977-06-07 RN004481 SWL 60.16
1979-12-12 RN004481 SWL 58.76
1971-04-24 RN004482 SWL 79.93
1971-09-29 RN004482 SWL 79.97
1995-09-19 RN004482 SWL 92.91
1996-02-08 RN004482 SWL 93.15
1964-10-29 RN00448411 SWL 67.87
1965-03-04 RN004687 SWL 74.90
1993-03-16 RN02528611 SWL 7.50
2011-10-24 RN029429 SWL 2.59
2011-11-05 RN029429 SWL 2.68
1992-06-24 RN004464 SWL 52.24
1986-08-11 RN004482 SWL 86.84
1998-01-29 RN004482 SWL 94.33
1966-11-24 RN004687 DTW 75.16
1978-08-30 RN004687 SWL 78.24
1983-02-22 RN004687 DTW 81.00
1984-07-24 RN004687 SWL 81.26
1993-07-07 RN004687 SWL 87.18
1994-04-08 RN004687 DTW 87.53
1994-08-11 RN004687 SWL 87.41
2001-01-10 RN004687 SWL 92.04
2010-11-15 RN004687 SWL 97.06
1964-10-01 RN004693 SWL 59.56
1965-06-03 RN004693 SWL 59.74
1967-05-19 RN004693 SWL 59.58
1967-06-23 RN004693 RSWL 59.61
1967-09-22 RN004693 RSWL 59.69
1970-12-16 RN004693 DTW 59.54
''')
df = pd.read_csv(TESTDATA, delim_whitespace=True)
最佳答案
这是一种方法:
pd.concat([d_max, d_min, q.unstack().result], axis=1, keys=['max', 'min', 'quantiles'])
关于python - python(pandas):重组groupby语句,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38238215/