如何用 pandas 选择重复的行? | 选择重复的行

选择重复的行

关注(28)粉丝(399)

如何用 pandas 选择重复的行?

本文介绍了如何用 pandas 选择重复的行?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个像这样的数据框:

I have a dataframe like this:

import pandas as pd
dic = {'A':[100,200,250,300],
       'B':['ci','ci','po','pa'],
       'C':['s','t','p','w']}
df = pd.DataFrame(dic)

我的目标是将行分为2个数据框:

My goal is to separate the row in 2 dataframes:

df1 =包含沿 B 列(不重复的行)不重复值的所有行.
df2 =仅包含重复主题的行.

df1 = contains all the rows that do not repeat values along column B (unque rows).
df2 = containts only the rows who repeat themeselves.

结果应如下所示:

df1 =      A  B C         df2 =     A  B C
      0  250 po p               0  100 ci s
      1  300 pa w               1  250 ci t

注意:

数据框通常可能很大，并且在B列中有很多重复的值，因此答案应尽可能通用
- 如果没有重复项，则df2应该为空！所有结果应在df1中
推荐答案
您可以使用使用参数 keep = False 的 Series.duplicated 为所有重复项创建掩码，然后为 布尔索引 ，〜反转 mask :
You can use Series.duplicated with parameter keep=False to create a mask for all duplicates and then boolean indexing, ~ to invert the mask:
```
mask = df.B.duplicated(keep=False)
print (mask)
0     True
1     True
2    False
3    False
Name: B, dtype: bool

print (df[mask])
     A   B  C
0  100  ci  s
1  200  ci  t

print (df[~mask])
     A   B  C
2  250  po  p
3  300  pa  w
```
这篇关于如何用 pandas 选择重复的行?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

08-24 08:51