问题描述
我在Python中有一个包含3列的数据框:
I have a dataframe with 3 columns in Python:
Name1 Name2 Value
Juan Ale 1
Ale Juan 1
,并希望消除基于Name1和Name2组合列的重复项.
and would like to eliminate the duplicates based on columns Name1 and Name2 combinations.
在我的示例中,两行相等(但是顺序不同),我想删除第二行并保留第一行,所以最终结果应该是:
In my example both rows are equal (but they are in different order), and I would like to delete the second row and just keep the first one, so the end result should be:
Name1 Name2 Value
Juan Ale 1
任何想法都将不胜感激!
Any idea will be really appreciated!
推荐答案
您可以转换为frozenset
并使用 pd.DataFrame.duplicated
.
You can convert to frozenset
and use pd.DataFrame.duplicated
.
res = df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]
print(res)
Name1 Name2 Value
0 Juan Ale 1
因为duplicated
使用散列检查重复项,所以
frozenset
而不是set
是必需的.
frozenset
is necessary instead of set
since duplicated
uses hashing to check for duplicates.
与行相比,对列的缩放更好.对于大量行,请使用@Wen的基于排序的算法.
Scales better with columns than rows. For a large number of rows, use @Wen's sort-based algorithm.
这篇关于Python-删除基于两个列组合的数据框中的重复项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!