python - 基于字段的子集数据帧

mukey   cokey     hzdept_r  hzdepb_r
422927  11090397    0        20
422927  11090397    20       71
422927  11090397    71       152
422927  11090398    0        18
422927  11090398    18       117
422927  11090398    117      152

我想对上面的数据帧进行子集设置，以便仅选择cokey的FIRST SET（在这种情况下为11090397）。当然，由于这是示例数据集，因此解决方案需要扩展到此类数据框的较大版本。

在这种情况下，结果数据集应为：

mukey   cokey     hzdept_r  hzdepb_r
422927  11090397    0        20
422927  11090397    20       71
422927  11090397    71       152

我尝试使用groupby，但不确定如何从中仅选择第一个cokey值。

最佳答案

另一种方法是仅采用第一个唯一值：

In [97]:

df[df['cokey'] == df['cokey'].unique()[0]]
Out[97]:
    mukey     cokey  hzdept_r  hzdepb_r
0  422927  11090397         0        20
1  422927  11090397        20        71
2  422927  11090397        71       152

您还可以使用基于整数的索引来获取要过滤的第一个值：

In [99]:

df[df['cokey'] == df['cokey'].iloc[0]]
Out[99]:
    mukey     cokey  hzdept_r  hzdepb_r
0  422927  11090397         0        20
1  422927  11090397        20        71
2  422927  11090397        71       152

关于python - 基于字段的子集数据帧，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/29378242/