具有10行的ABC和22550行的XYZ。
值的数据框ABC:
0 1 2
0 sun is rising | UNKNOWN | 1465465
1 micheal has arrived | UNKNOWN | 324654
2 goal has been scored | UNKNOWN | 547854
和其他XYZ值
0 1
0 sun | password1
1 goal | password2
....
....
.....
....
22550
22551 micheal | password3
如何用(sun,goal和micheal)ABC映射XYZ,以便用密码1替换ABC中的UNKNOWN 1
我需要的输出
0 1 2
0 sun is rising | password1 | 1465465
1 micheal has arrived | password3 | 324654
2 goal has been scored| password2 | 547854
尝试以下并得到相应的错误:
d = dict(zip(XYZ[0],XYZ[1]))
pat = (r'({})'.format('|'.join(d.keys())))
ABC[1]=ABC[0].str.extract(pat,expand=False).map(d)
print(ABC)
错误:TypeError:序列项16069:预期的str实例,找到了float
from itertools import chain
abc.loc[:,1] = list(chain(*[xyz.loc[abc[0].str.contains(i),1] for i in xyz[0]]))
错误:IndexingError:作为索引器提供的不可对齐的布尔系列(布尔系列和被索引对象的索引不匹配
d = dict(zip(XYZ[0], XYZ[1]))
ABC[1] = [next(d.get(y) for y in x.split() if y in d) for x in ABC[0]]
print (ABC)
错误:StopIteration:
最佳答案
如果值不匹配,则可以获取默认参数no match
:
d = dict(zip(XYZ[0].str.lower(), XYZ[1]))
ABC[1] = [next(iter(d.get(y) for y in x.lower().split() if y in d),'no match') for x in ABC[0]]
通用解决方案:
import re
XYZ = XYZ.dropna()
d = dict(zip(XYZ[0].str.lower(), XYZ[1]))
for k, v in d.items():
ABC.loc[ABC[0].str.contains(re.escape(k), case=False, na=False), 1] = v
关于python - 如何匹配和合并两个值完全不同的数据框(单个单词除外)?具有10行的ABC和22550行的XYZ,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54531411/