我有2个熊猫数据框df1
和df2
Name No
A 1
A 2
B 5
Player Gender
A F
B M
C F
我想使用
sex
中df1
列中的相应值在gender
数据框中创建一个新列df2
。用于查找的列是Name
中的df1
和Player
中的df2
。非常感谢任何帮助
最佳答案
在map
列中使用df2
by set_index
,其中Player
在map
中:
df1['sex'] = df1.Name.map(df2.set_index('Player')['Gender'])
print (df1)
Name No sex
0 A 1 F
1 A 2 F
2 B 5 M
这与
dict
的相同:d = df2.set_index('Player')['Gender'].to_dict()
print (d)
{'A': 'F', 'B': 'M', 'C': 'F'}
df1['sex'] = df1.Name.map(d)
print (df1)
Name No sex
0 A 1 F
1 A 2 F
2 B 5 M
要么:
print (pd.merge(df1,df2, left_on='Name', right_on='Player')
.rename(columns={'Gender':'sex'})
.drop('Player', axis=1))
Name No sex
0 A 1 F
1 A 2 F
2 B 5 M
首先是更快:
In [46]: %timeit (pd.merge(df1,df2, left_on='Name', right_on='Player').rename(columns={'Gender':'sex'}).drop('Player', axis=1))
The slowest run took 4.53 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.53 ms per loop
In [47]: %timeit df1.Name.map(df2.set_index('Player')['Gender'])
The slowest run took 4.78 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 882 µs per loop
关于python - 在 Pandas 中具有不同列名称的查找值,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37894807/