我有一些数据,它们代表了许多不同站点上的时间结果。我想找到我的结果的四分位数细分,以及每个站点的最大和最小日期。

找到这些中的每一个都很容易:

#quartiles
q = df.groupby(['site_id', 'datum']).quantile([0.25,0.5,0.75])
#max and min vlaues
d_max = df.groupby(['site_id', 'datum']).max()
d_min = df.groupby(['site_id', 'datum']).min()


结果是多索引数据帧。如何将它们重新结合在一起,以获取site_id和datum每种组合的所有3个值?

一些样本数据:

from io import StringIO
import pandas as pd

TESTDATA=StringIO(u'''date  site_id datum   result
1968-01-10  RN004481    SWL     61.23
1977-06-07  RN004481    SWL     60.16
1979-12-12  RN004481    SWL     58.76
1971-04-24  RN004482    SWL     79.93
1971-09-29  RN004482    SWL     79.97
1995-09-19  RN004482    SWL     92.91
1996-02-08  RN004482    SWL     93.15
1964-10-29  RN00448411  SWL     67.87
1965-03-04  RN004687    SWL     74.90
1993-03-16  RN02528611  SWL     7.50
2011-10-24  RN029429    SWL     2.59
2011-11-05  RN029429    SWL     2.68
1992-06-24  RN004464    SWL     52.24
1986-08-11  RN004482    SWL     86.84
1998-01-29  RN004482    SWL     94.33
1966-11-24  RN004687    DTW     75.16
1978-08-30  RN004687    SWL     78.24
1983-02-22  RN004687    DTW     81.00
1984-07-24  RN004687    SWL     81.26
1993-07-07  RN004687    SWL     87.18
1994-04-08  RN004687    DTW     87.53
1994-08-11  RN004687    SWL     87.41
2001-01-10  RN004687    SWL     92.04
2010-11-15  RN004687    SWL     97.06
1964-10-01  RN004693    SWL     59.56
1965-06-03  RN004693    SWL     59.74
1967-05-19  RN004693    SWL     59.58
1967-06-23  RN004693    RSWL    59.61
1967-09-22  RN004693    RSWL    59.69
1970-12-16  RN004693    DTW     59.54
''')

df = pd.read_csv(TESTDATA, delim_whitespace=True)

最佳答案

这是一种方法:

pd.concat([d_max, d_min, q.unstack().result], axis=1, keys=['max', 'min', 'quantiles'])


python - python(pandas):重组groupby语句-LMLPHP

关于python - python(pandas):重组groupby语句,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38238215/

10-12 23:48