问题描述
我是Pandas的新手,我想合并两个具有相似列的数据集.除了许多相同的值之外,各列将与另一列相比具有一些唯一的值.我想保留每列中的一些重复项.我想要的输出如下所示.添加how ='inner'或'outer'不会产生预期的结果.
I'm new to Pandas and I want to merge two datasets that have similar columns. The columns are going to each have some unique values compared to the other column, in addition to many identical values. There are some duplicates in each column that I'd like to keep. My desired output is shown below. Adding how='inner' or 'outer' does not yield the desired result.
import pandas as pd
dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}
df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)
print(pd.merge(df1,df2))
output:
A
0 2
1 2
2 2
3 2
4 3
5 4
6 5
desired/expected output:
A
0 2
1 2
2 3
3 4
4 5
请让我知道如何/如果可以通过合并实现所需的输出,谢谢!
Please let me know how/if I can achieve the desired output using merge, thank you!
编辑为了弄清楚为什么我对此行为感到困惑,如果仅添加另一列,它不会产生四个2,而是只有两个2,因此我希望在我的第一个示例中它也会具有两个2.为什么行为似乎会改变,大熊猫在做什么?
EDITTo clarify why I'm confused about this behavior, if I simply add another column, it doesn't make four 2's but rather there are only two 2's, so I would expect that in my first example it would also have the two 2's. Why does the behavior seem to change, what's pandas doing?
import pandas as pd
dict1 = {'A':[2,2,3,4,5],
'B':['red','orange','yellow','green','blue'],
}
dict2 = {'A':[2,2,3,4,5],
'B':['red','orange','yellow','green','blue'],
}
df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)
print(pd.merge(df1,df2))
output:
A B
0 2 red
1 2 orange
2 3 yellow
3 4 green
4 5 blue
However, based on the first example I would expect:
A B
0 2 red
1 2 orange
2 2 red
3 2 orange
4 3 yellow
5 4 green
6 5 blue
推荐答案
import pandas as pd
dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}
df1 = pd.DataFrame(dict1).reset_index()
df2 = pd.DataFrame(dict2).reset_index()
df = df1.merge(df2, on = 'A')
df = pd.DataFrame(df[df.index_x==df.index_y]['A'], columns=['A']).reset_index(drop=True)
print(df)
输出:
A
0 2
1 2
2 3
3 4
4 5
这篇关于 pandas 合并会创建多余的重复条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!