问题描述
我想从 df1
中提取那些不在 df2
中的行(身份是索引)。对于下面的示例,我希望返回 df1
中的第一行。不幸的是,结果是空的。
import pandas as pd
df1 = pd.DataFrame({
'level-0':['a','a','a','a','a','a'],
'level-1' ,'s2','s2','s2','s2','s2'],
'level-2':['1','1','1' 1','1'],
'level-3':['19','20','21','22','23','24'],
' -4':['HRB','HRB','HRB','HRB','HRB','HRB'],
'name':['a','b','c' ,'d','e','f']
})
df1 = df1.set_index(['level-0','level-1','level-2 ','level-3','level-4'],drop = False)
df2 = pd.DataFrame({
'level-0':['a' a','a','a','a','b'],
'level-1':['s2','s2','s2','s2' ,'s2'],
'level-2':['1','1','1','1' ,'1','1'],
'level-3':['19','20','21','22','23','24'],
'level-4':['HRB','HRB','HRB'''HRB'''HRB''''
})
df2 = df2.set_index(['level -0','level-1','level-2','level-3','level-4'],drop = False)
#df1中的所有索引,在df2
df_unknown = df1 [〜df1.index.isin(df2.index)]
打印df_unknown
选择有什么问题?
更新
int ,而要比较的数据框的列已经转换为
str 。这导致了不同的索引。解决方案 set_index
默认情况下,所以在调用之后, df1
和 df2
仍然具有数字索引。执行
df2.set_index(...,inplace = True)
/ pre>
或
df2 = df2.set_index 。)
你会看到,目前大多数的大多数方法都是以这种方式工作的。
I'd like to extract those rows from df1
which are not existent in df2
(identity is the index). For the below example, I would expect the first row in df1
to be returned. Unfortunately, the result is empty.
import pandas as pd
df1 = pd.DataFrame({
'level-0': ['a', 'a', 'a', 'a', 'a', 'a'],
'level-1': ['s2', 's2', 's2', 's2', 's2', 's2'],
'level-2': ['1', '1', '1', '1', '1', '1'],
'level-3': ['19', '20', '21', '22', '23', '24'],
'level-4': ['HRB', 'HRB', 'HRB', 'HRB', 'HRB', 'HRB'],
'name': ['a', 'b', 'c', 'd', 'e', 'f']
})
df1 = df1.set_index(['level-0', 'level-1', 'level-2', 'level-3', 'level-4'], drop=False)
df2 = pd.DataFrame({
'level-0': ['a', 'a', 'a', 'a', 'a', 'b'],
'level-1': ['s2', 's2', 's2', 's2', 's2', 's2'],
'level-2': ['1', '1', '1', '1', '1', '1'],
'level-3': ['19', '20', '21', '22', '23', '24'],
'level-4': ['HRB', 'HRB', 'HRB', 'HRB', 'HRB', 'HRB']
})
df2 = df2.set_index(['level-0', 'level-1', 'level-2', 'level-3', 'level-4'], drop=False)
# all indices that are in df1 but not in df2
df_unknown = df1[~df1.index.isin(df2.index)]
print df_unknown
What's wrong with the selection?
Update
I figured out what went wrong. The dataframes were read from an Excel file and some Series were interpreted as int
, while the dataframe to compare with had its columns already converted to str
. This resulted in different indices.
解决方案 set_index
is not in place by default, so df1
and df2
still have their numeric index after the call. Do either
df2.set_index(..., inplace=True)
or
df2 = df2.set_index(...)
You will see that by far the most methods in pandas work that way.
这篇关于获取一个数据帧中存在的行,而不是另一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!