问题描述
让我们说这是我的数据框
Lets say this is my data-frame
df = pd.DataFrame({ 'bio' : ['1', '1', '1', '4'],
'center' : ['one', 'one', 'two', 'three'],
'outcome' : ['f','t','f','f'] })
看起来像这样...
bio center outcome
0 1 one f
1 1 one t
2 1 two f
3 4 three f
我要删除第1行,因为它具有相同的生物&居中作为第0行.我想保留第2行,因为它具有相同的生物但中心与第0行不同.
I want to drop row 1 because it has the same bio & center as row 0.I want to keep row 2 because it has the same bio but different center then row 0.
基于drop_duplicates输入结构,类似的操作将无法正常工作,但这是我正在尝试的操作
Something like this won't work based on drop_duplicates input structure but it's what I am trying to do
df.drop_duplicates(subset = 'bio' & subset = 'center' )
有什么建议吗?
edit:对df进行了一些更改,以使其符合正确答案的示例
edit : changed df a bit to fit example by correct answer
推荐答案
您的语法错误.这是正确的方法:
Your syntax is wrong. Here's the correct way:
df.drop_duplicates(subset=['bio', 'center', 'outcome'])
或者在这种情况下,只需:
Or in this specific case, just simply:
df.drop_duplicates()
两者都返回以下内容:
bio center outcome
0 1 one f
2 1 two f
3 4 three f
看看df.drop_duplicates
文档以获得语法详细信息. subset
应该是列标签的序列.
Take a look at the df.drop_duplicates
documentation for syntax details. subset
should be a sequence of column labels.
这篇关于如何基于Pandas数据框中的两个或多个子集条件删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!