问题描述
背景
我有以下示例 df
将pandas导入为pddf = pd.DataFrame({'Text' : ['Jon J Mmith 从 **BLOCK** 到 **BLOCK**','此处未找到 P_Name','Jane Ann Doe 也在这里直到 **BLOCK** ','**BLOCK** 是 **BLOCK** Tom Tcker 不在这里,而是 **BLOCK** '],'P_ID': [1,2,3,4],'P_Name' : ['Mmith, Jon J', 'Hder, Mary', 'Doe, Jane Ann', 'Tcker, Tom'],'N_ID' : ['A1', 'A2', 'A3', 'A4']})#重新排列列df = df[['文本','N_ID','P_ID','P_Name']]df文本 N_ID P_ID P_Name0 Jon J Mmith 从 **BLOCK** 到 **BLOCK** A1 1 Mmith, Jon J1 此处未找到 P_Name A2 2 Hder, Mary2 Jane Ann Doe 也在这里直到 **BLOCK** A3 3 Doe, Jane Ann3 **BLOCK** 是 **BLOCK** Tom Tcker 不在,而是 A4 4 Hcker,Tom
目标
1) 在 Text
列中,将 **BLOCK**
添加到与该值对应的值(例如 Jon J Mmith
)在 P_Name
期望输出
文本 N_ID P_ID P_Name0 **BLOCK** 是从 **BLOCK** 到 **BLOCK** A1 1 Mmith, Jon J1 此处未找到 P_Name A2 2 Hder, Mary2 **BLOCK** 也在这里直到 **BLOCK** A3 3 Doe, Jane Ann3 **BLOCK** 是 **BLOCK** **BLOCK** 不在这里,而是 A4 4 Tcker,汤姆
所需的输出可以出现在相同的 Text
列中,也可以使用 new_col
问题
如何实现我想要的输出?
一种方式:
>>>df['Text'].replace(df['P_Name'].str.split(', *').apply(lambda l: ' '.join(l[::-1])),'**BLOCK**',正则表达式=真)0 **BLOCK** 是从 **BLOCK** 到 **BLOCK**1 此处未找到 P_Name2 **BLOCK** 也在这里直到 **BLOCK**3 **BLOCK** 是 **BLOCK** **BLOCK** 不在这里而是 **...您可以使用 replace=True
就地执行此操作,或使用上述 df['new_col']=
创建一个新列.它的作用是拆分 P_name
列,用空格将其反向连接,然后将其替换到您的 Text
列中.
Background
I have the following sample df
import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Mmith is Here from **BLOCK** until **BLOCK**',
'No P_Name Found here',
'Jane Ann Doe is Also here until **BLOCK** ',
'**BLOCK** was **BLOCK** Tom Tcker is Not here but **BLOCK** '],
'P_ID': [1,2,3,4],
'P_Name' : ['Mmith, Jon J', 'Hder, Mary', 'Doe, Jane Ann', 'Tcker, Tom'],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
#rearrange columns
df = df[['Text','N_ID', 'P_ID', 'P_Name']]
df
Text N_ID P_ID P_Name
0 Jon J Mmith is Here from **BLOCK** until **BLOCK** A1 1 Mmith, Jon J
1 No P_Name Found here A2 2 Hder, Mary
2 Jane Ann Doe is Also here until **BLOCK** A3 3 Doe, Jane Ann
3 **BLOCK** was **BLOCK** Tom Tcker is Not here but A4 4 Hcker, Tom
Goal
1) In Text
column, add **BLOCK**
to the value (e.g. Jon J Mmith
) that corresponds to the value found in P_Name
Desired Output
Text N_ID P_ID P_Name
0 **BLOCK** is Here from **BLOCK** until **BLOCK** A1 1 Mmith, Jon J
1 No P_Name Found here A2 2 Hder, Mary
2 **BLOCK** is Also here until **BLOCK** A3 3 Doe, Jane Ann
3 **BLOCK** was **BLOCK** **BLOCK** is Not here but A4 4 Tcker, Tom
The desired output can occur in the same Text
col or a new_col
can be made
Question
How do I achieve my desired output?
One way:
>>> df['Text'].replace(df['P_Name'].str.split(', *').apply(lambda l: ' '.join(l[::-1])),'**BLOCK**',regex=True)
0 **BLOCK** is here from **BLOCK** until **BLOCK**
1 No P_Name found here
2 **BLOCK** is also here until **BLOCK**
3 **BLOCK** was **BLOCK** **BLOCK** is not here but **...
You can use replace=True
to do this in place, or create a new column with df['new_col']=
the above. What this does is splits the P_name
column, joins it in reverse with a space, and replaces it in your Text
column.
这篇关于根据名称更改 pandas 列中的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!