我在不同的数据框中有3个不同的列,如下所示。
第1列有句子模板,例如“他想本周[行动]”。
第2列有成对的单词,例如“运动,游泳”。
3d列具有单词对的类型,例如[行动]。
我认为R中应该有一些类似于“融化”的东西,但是我不确定如何进行替换。
我想创建一个新的列/数据框,它将为每个句子模板(每行一个句子)提供所有可能的选项:
他本周想锻炼。
他想这周游泳。
模板的数量明显少于我的单词数。单词对有几种类型(动作,描述,对象等)。
#a simple example of what I would like to achieve
import pandas as pd
#input1
templates = pd.DataFrame(columns=list('AB'))
templates.loc[0] = [1,'He wants to [action] this week']
templates.loc[1] = [2,'She noticed a(n) [object] in the distance']
templates
#input 2
words = pd.DataFrame(columns=list('AB'))
words.loc[0] = ['exercise, swim', 'action']
words.loc[1] = ['bus, shop', 'object']
words
#output
result = pd.DataFrame(columns=list('AB'))
result.loc[0] = [1, 'He wants to exercise this week']
result.loc[1] = [2, 'He wants to swim this week']
result.loc[2] = [3, 'She noticed a(n) bus in the distance']
result.loc[3] = [4, 'She noticed a(n) shop in the distance']
result
最佳答案
首先用Series.str.extract
用来自words['B']
的单词创建新列,然后使用Series.map
替换值:
pat = '|'.join(r"\[{}\]".format(re.escape(x)) for x in words['B'])
templates['matched'] = templates['B'].str.extract('('+ pat + ')', expand=False).fillna('')
templates['repl'] =(templates['matched'].map(words.set_index('B')['A']
.rename(lambda x: '[' + x + ']'))).fillna('')
print (templates)
A B matched repl
0 1 He wants to [action] this week [action] exercise, swim
1 2 She noticed a(n) [object] in the distance [object] bus, shop
然后替换为列表理解:
z = zip(templates['B'],templates['repl'], templates['matched'])
result = pd.DataFrame({'B':[a.replace(c, y) for a,b,c in z for y in b.split(', ')]})
result.insert(0, 'A', result.index + 1)
print (result)
A B
0 1 He wants to exercise this week
1 2 He wants to swim this week
2 3 She noticed a(n) bus in the distance
3 4 She noticed a(n) shop in the distance
关于python - 如何在pandas数据框中创建新列,并用不同的替换方式替换每一行中的一部分字符串?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56578870/