抱歉,如果这是一个相当新手的问题。我试图找到两个数据帧之间共有哪些行。返回值应该是与df1相同的df2的行索引。我笨拙的例子:

df1 = pd.DataFrame({'col1':['cx','cx','cx2'], 'col2':[1,4,12]})
df1['col2'] = df1['col2'].map(str);
df2 = pd.DataFrame({'col1':['cx','cx','cx','cx','cx2','cx2'], 'col2':[1,3,5,10,12,12]})
df2['col2'] = df2['col2'].map(str);

df1['idx'] = df1[['col1','col2']].apply(lambda x: '_'.join(x),axis=1);
df2['idx'] = df2[['col1','col2']].apply(lambda x: '_'.join(x),axis=1);

df1['idx_values'] = df1.index.values
df2['idx_values'] = df2.index.values

df3 = pd.merge(df1,df2,on = 'idx');
myindexes = df3['idx_values_y'];

myindexes.to_csv(idir + 'test.txt',sep='\t',index = False);


返回值应为[0,4,5]。高效地完成此操作将非常棒,因为两个数据帧将具有几百万行。

谢谢!

最佳答案

不需要带有连接值的新列,默认情况下,通过两列进行内部合并,并且如果需要df2.index值,请添加reset_index

df1 = pd.DataFrame({'col1':['cx','cx','cx2'], 'col2':[1,4,12]})
df2 = pd.DataFrame({'col1':['cx','cx','cx','cx','cx2','cx2'], 'col2':[1,3,5,10,12,12]})

df3 = pd.merge(df1,df2.reset_index(), on = ['col1','col2'])
print (df3)
  col1 col2  index
0   cx    1      0
1  cx2   12      4
2  cx2   12      5


对于两个索引都需要:

df4 = pd.merge(df1.reset_index(),df2.reset_index(), on = ['col1','col2'])
print (df4)

   index_x col1  col2  index_y
0        0   cx     1        0
1        2  cx2    12        4
2        2  cx2    12        5


仅对于两个DataFrame的交集:

df5 = pd.merge(df1,df2, on = ['col1','col2'])
#if 2 column DataFrame
#df5 = pd.merge(df1,df2)
print (df5)

  col1  col2
0   cx     1
1  cx2    12
2  cx2    12

关于python - python panda:返回常见行的索引,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50309108/

10-13 04:58
查看更多