问题描述
我有两个看起来像这样的数据帧
I have two data frames which look like this
df1
name ID abb
0 foo 251803 I
1 bar 376811 R
2 baz 174254 Q
3 foofoo 337144 IRQ
4 barbar 306521 IQ
df2
abb comment
0 I fine
1 R repeat
2 Q other
我正在尝试使用熊猫merge
来连接两个数据框,并以以下方式基于abb
列将第二个数据框中的comment
列简单地分配给第一个数据框:
I am trying to use pandas merge
to join the two data frames and simply assign the comment
column in the second data frame to the first based on the abb
column in the following way:
df1.merge(df2, how='inner', on='abb')
导致:
name ID abb comment
0 foo 251803 I fine
1 bar 376811 R repeat
2 baz 174254 Q other
这对于abb
中的唯一一个字母标识符很有效.但是,它显然不止一个字符.
This works well for the unique one letter identifiers in abb
. However, it obviously fails for more than one character.
我尝试在第一个数据帧的abb
列上使用list
,但这会导致KeyError
.
I tried to use list
on the abb
column in first data frame but this results in a KeyError
.
我想做的是以下事情.
1)将此列中包含多个字符的行分成几行
1) Seperate the rows containing more than one character in this column into several rows
2)合并数据帧
3)(可选):再次合并行
3) Optionally: Combine the rows again
推荐答案
使用 join
:
print (df1)
name ID abb
0 foo 251803 I
1 bar 376811 R
2 baz 174254 Q
3 foofoo 337144 IRQ
4 barbar 306521 IQ
#each character to df, which is stacked to Series
s = df1.abb.apply(lambda x: pd.Series(list(x)))
.stack()
.reset_index(drop=True, level=1)
.rename('abb')
print (s)
0 I
1 R
2 Q
3 I
3 R
3 Q
4 I
4 Q
Name: abb, dtype: object
df1 = df1.drop('abb', axis=1).join(s)
print (df1)
name ID abb
0 foo 251803 I
1 bar 376811 R
2 baz 174254 Q
3 foofoo 337144 I
3 foofoo 337144 R
3 foofoo 337144 Q
4 barbar 306521 I
4 barbar 306521 Q
这篇关于将两个数据框合并为多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!