本文介绍了在字符串修改中包括单词边界更具体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
背景
以下是从修改跳过空列出并继续功能
import pandas as pd
Names = [list(['ann']),
list([]),
list(['elisabeth', 'lis']),
list(['his','he']),
list([])]
df = pd.DataFrame({'Text' : ['ann had an anniversery today',
'nothing here',
'I like elisabeth and lis 5 lists ',
'one day he and his cheated',
'same here'
],
'P_ID': [1,2,3, 4,5],
'P_Name' : Names
})
#rearrange columns
df = df[['Text', 'P_ID', 'P_Name']]
df
Text P_ID P_Name
0 ann had an anniversery today 1 [ann]
1 nothing here 2 []
2 I like elisabeth and lis 5 lists 3 [elisabeth, lis]
3 one day he and his cheated 4 [his, he]
4 same here 5 []
下面的代码有效
m = df['P_Name'].str.len().ne(0)
df.loc[m, 'New'] = df.loc[m, 'Text'].replace(df.loc[m].P_Name,'**BLOCK**',regex=True)
并执行以下操作
1) 使用P_Name
中的名称,通过放置**BLOCK**
1) uses the name in P_Name
to block the corresponding text in the Text
column by placing **BLOCK**
2) 产生一个新列 New
2) produces a new column New
如下图
Text P_ID P_Name New
0 **BLOCK** had an **BLOCK**iversery today
1 NaN
2 I like **BLOCK** and **BLOCK** 5 **BLOCK**ts
3 one day **BLOCK** and **BLOCK** c**BLOCK**ated
4 NaN
问题
然而,这段代码有点太好了".
However, this code works a little "too well."
使用P_Name
中的['his','he']
来屏蔽Text
:
示例:有一天他和他的被骗
变成了一天**BLOCK**和**BLOCK** c**BLOCK**ated
期望:有一天他和他的被骗
变成了有一天**BLOCK**和**BLOCK**被骗
在这个例子中,我希望 cheated
保持 cheated
而不是 c**BLOCK**ated
In this example, I would like cheated
to stay as cheated
and not become c**BLOCK**ated
期望输出
Text P_ID P_Name New
0 **BLOCK** had an anniversery today
1 NaN
2 I like **BLOCK** and **BLOCK**5 lists
3 one day **BLOCK** and **BLOCK** cheated
4 NaN
问题
如何实现我想要的输出?
How do I achieve my desired output?
推荐答案
有时for 循环 是很好的做法
df['New']=[pd.Series(x).replace(dict.fromkeys(y,'**BLOCK**') ).str.cat(sep=' ')for x , y in zip(df.Text.str.split(),df.P_Name)]
df.New.where(df.P_Name.astype(bool),inplace=True)
df
Text ... New
0 ann had an anniversery today ... **BLOCK** had an anniversery today
1 nothing here ... NaN
2 I like elisabeth and lis 5 lists ... I like **BLOCK** and **BLOCK** 5 lists
3 one day he and his cheated ... one day **BLOCK** and **BLOCK** cheated
4 same here ... NaN
[5 rows x 4 columns]
这篇关于在字符串修改中包括单词边界更具体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!