本文介绍了根据名称更改 pandas 列中的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我有以下示例 df

将pandas导入为pddf = pd.DataFrame({'Text' : ['Jon J Mmith 从 **BLOCK** 到 **BLOCK**','此处未找到 P_Name','Jane Ann Doe 也在这里直到 **BLOCK** ','**BLOCK** 是 **BLOCK** Tom Tcker 不在这里,而是 **BLOCK** '],'P_ID': [1,2,3,4],'P_Name' : ['Mmith, Jon J', 'Hder, Mary', 'Doe, Jane Ann', 'Tcker, Tom'],'N_ID' : ['A1', 'A2', 'A3', 'A4']})#重新排列列df = df[['文本','N_ID','P_ID','P_Name']]df文本 N_ID P_ID P_Name0 Jon J Mmith 从 **BLOCK** 到 **BLOCK** A1 1 Mmith, Jon J1 此处未找到 P_Name A2 2 Hder, Mary2 Jane Ann Doe 也在这里直到 **BLOCK** A3 3 Doe, Jane Ann3 **BLOCK** 是 **BLOCK** Tom Tcker 不在,而是 A4 4 Hcker,Tom

目标

1) 在 Text 列中,将 **BLOCK** 添加到与该值对应的值(例如 Jon J Mmith)在 P_Name

中找到

期望输出

 文本 N_ID P_ID P_Name0 **BLOCK** 是从 **BLOCK** 到 **BLOCK** A1 1 Mmith, Jon J1 此处未找到 P_Name A2 2 Hder, Mary2 **BLOCK** 也在这里直到 **BLOCK** A3 3 Doe, Jane Ann3 **BLOCK** 是 **BLOCK** **BLOCK** 不在这里,而是 A4 4 Tcker,汤姆

所需的输出可以出现在相同的 Text 列中,也可以使用 new_col

问题

如何实现我想要的输出?

解决方案

一种方式:

>>>df['Text'].replace(df['P_Name'].str.split(', *').apply(lambda l: ' '.join(l[::-1])),'**BLOCK**',正则表达式=真)0 **BLOCK** 是从 **BLOCK** 到 **BLOCK**1 此处未找到 P_Name2 **BLOCK** 也在这里直到 **BLOCK**3 **BLOCK** 是 **BLOCK** **BLOCK** 不在这里而是 **...

您可以使用 replace=True 就地执行此操作,或使用上述 df['new_col']= 创建一个新列.它的作用是拆分 P_name 列,用空格将其反向连接,然后将其替换到您的 Text 列中.

Background

I have the following sample df

import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Mmith is Here from **BLOCK** until **BLOCK**',
                                   'No P_Name Found here',
                                   'Jane Ann Doe is Also here until **BLOCK** ',
                                '**BLOCK** was **BLOCK** Tom Tcker is Not here but **BLOCK** '],

                      'P_ID': [1,2,3,4],
                      'P_Name' : ['Mmith, Jon J', 'Hder, Mary', 'Doe, Jane Ann', 'Tcker, Tom'],
                      'N_ID' : ['A1', 'A2', 'A3', 'A4']

                     })

#rearrange columns
df = df[['Text','N_ID', 'P_ID', 'P_Name']]
df


                         Text                       N_ID    P_ID    P_Name
0   Jon J Mmith is Here from **BLOCK** until **BLOCK**  A1        1 Mmith, Jon J
1   No P_Name Found here                            A2        2 Hder, Mary
2   Jane Ann Doe is Also here until **BLOCK**           A3        3 Doe, Jane Ann
3   **BLOCK** was **BLOCK** Tom Tcker is Not here but  A4         4 Hcker, Tom

Goal

1) In Text column, add **BLOCK** to the value (e.g. Jon J Mmith) that corresponds to the value found in P_Name

Desired Output

                         Text                       N_ID    P_ID    P_Name
0   **BLOCK** is Here from **BLOCK** until **BLOCK**        A1        1 Mmith, Jon J
1   No P_Name Found here                            A2        2 Hder, Mary
2   **BLOCK** is Also here until **BLOCK**              A3        3 Doe, Jane Ann
3   **BLOCK** was **BLOCK** **BLOCK** is Not here but     A4          4 Tcker, Tom

The desired output can occur in the same Text col or a new_col can be made

Question

How do I achieve my desired output?

解决方案

One way:

>>> df['Text'].replace(df['P_Name'].str.split(', *').apply(lambda l: ' '.join(l[::-1])),'**BLOCK**',regex=True)
0           **BLOCK** is here from **BLOCK** until **BLOCK**
1                                 No P_Name found here
2                  **BLOCK** is also here until **BLOCK**
3    **BLOCK** was **BLOCK** **BLOCK** is not here but **...

You can use replace=True to do this in place, or create a new column with df['new_col']= the above. What this does is splits the P_name column, joins it in reverse with a space, and replaces it in your Text column.

这篇关于根据名称更改 pandas 列中的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-26 13:39
查看更多