python - 如何在pandas数据框中创建新列，并用不同的替换方式替换每一行中的一部分字符串？

我在不同的数据框中有3个不同的列，如下所示。

第1列有句子模板，例如“他想本周[行动]”。

第2列有成对的单词，例如“运动，游泳”。

3d列具有单词对的类型，例如[行动]。

我认为R中应该有一些类似于“融化”的东西，但是我不确定如何进行替换。

我想创建一个新的列/数据框，它将为每个句子模板（每行一个句子）提供所有可能的选项：

他本周想锻炼。

他想这周游泳。

模板的数量明显少于我的单词数。单词对有几种类型（动作，描述，对象等）。

#a simple example of what I would like to achieve

import pandas as pd

#input1
templates = pd.DataFrame(columns=list('AB'))
templates.loc[0] = [1,'He wants to [action] this week']
templates.loc[1] = [2,'She noticed a(n) [object] in the distance']
templates

#input 2
words = pd.DataFrame(columns=list('AB'))
words.loc[0] = ['exercise, swim', 'action']
words.loc[1] = ['bus, shop', 'object']
words

#output
result = pd.DataFrame(columns=list('AB'))
result.loc[0] = [1, 'He wants to exercise this week']
result.loc[1] = [2, 'He wants to swim this week']
result.loc[2] = [3, 'She noticed a(n) bus in the distance']
result.loc[3] = [4, 'She noticed a(n) shop in the distance']
result

最佳答案

首先用Series.str.extract用来自words['B']的单词创建新列，然后使用Series.map替换值：

pat = '|'.join(r"\[{}\]".format(re.escape(x)) for x in words['B'])
templates['matched'] = templates['B'].str.extract('('+ pat + ')', expand=False).fillna('')
templates['repl'] =(templates['matched'].map(words.set_index('B')['A']
                                                  .rename(lambda x: '[' + x + ']'))).fillna('')
print (templates)
   A                                          B   matched            repl
0  1             He wants to [action] this week  [action]  exercise, swim
1  2  She noticed a(n) [object] in the distance  [object]       bus, shop

然后替换为列表理解：

z = zip(templates['B'],templates['repl'], templates['matched'])
result = pd.DataFrame({'B':[a.replace(c, y) for a,b,c in z for y in b.split(', ')]})
result.insert(0, 'A', result.index + 1)
print (result)
   A                                      B
0  1         He wants to exercise this week
1  2             He wants to swim this week
2  3   She noticed a(n) bus in the distance
3  4  She noticed a(n) shop in the distance

关于python - 如何在pandas数据框中创建新列，并用不同的替换方式替换每一行中的一部分字符串？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/56578870/