python - 在第n个中断组索引之前选择列

我尝试从每个组的第一行中提取c列，但是很难理解为什么g['c'].nth(0)方法不能保留组索引。任何想法？

>>> df = pd.DataFrame({'a': [1, 1, 2, 2], 'b': ['b', 'b', 'b', 'a'], 'c': [1, 2, 3, 4]})
>>> g = df.groupby(['a', 'b'])
>>> g.nth(0)
     c
a b
1 b  1
2 a  4
  b  3
>>> g['c'].nth(0)
0    1
2    3
3    4
Name: c, dtype: int64
>>>
>>> df = pd.DataFrame({'a': [1, 1, 2, 2], 'b': ['b', 'b', 'b', 'a'], 'c': [1, 2, 3, 4]})
>>> g = df.groupby(['a', 'b'])
>>> g.nth(0)
     c
a b
1 b  1
2 a  4
  b  3
>>> g['c'].nth(0)
0    1
2    3
3    4
Name: c, dtype: int64
>>> g.nth(0)['c']
a  b
1  b    1
2  a    4
   b    3
Name: c, dtype: int64
>>>

为什么g.nth(0)['c']和g['c'].nth(0)不返回相同的系列（包括索引）？

更新资料

有趣的观察：

>>> g['c'].first()
a  b
1  b    1
2  a    4
   b    3
Name: c, dtype: int64

这正是我想要的，并且行为与g['c'].nth(0)不同。

最佳答案

我添加了新列d以进行更好的测试：

import pandas as pd
import numpy as np
import io


df = pd.DataFrame({'a': [1, 1, 2, 2], 'b': ['b', 'b', 'b', 'a'], 'c': [1, 2, 3, 4], 'd': [1, 2, 3, 4]})
print df
#   a  b  c  d
#0  1  b  1  1
#1  1  b  2  2
#2  2  b  3  3
#3  2  a  4  4
g = df.groupby(['a', 'b'])

#return SeriesGroupBy object and then apply nth
print g['c']
#<pandas.core.groupby.SeriesGroupBy object at 0x0000000014ED4EF0>
print g['c'].head()
#0    1
#1    2
#2    3
#3    4
#Name: c, dtype: int64
print g['c'].nth(0)
#0    1
#2    3
#3    4
#Name: c, dtype: int64

#return dataframe and then select c
print g.nth(0)
#     c  d
#a b
#1 b  1  1
#2 a  4  4
#  b  3  3
print g.nth(0)['c']
#a  b
#1  b    1
#2  a    4
#   b    3
#Name: c, dtype: int64

编辑：

为什么我需要将nth应用于整个分组数据框

因为您首先需要对所有组应用函数nth，然后再获取组的第一行。我尝试第二种方法。

在第一种方法中，您只需将C列与已经计算出的分组link一起传递给Series GroupBy object（查找New: Column selection）。
它在一起是df.groupby(['a', 'b'])['c']，然后应用功能nth。不适用于所有组df.groupby(['a', 'b'])。

我认为有链式功能，这取决于功能的顺序。

编辑1：

最后我报告它-它看起来像一个bug。

关于python - 在第n个中断组索引之前选择列，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/34237462/