本文介绍了更改 pandas 列中的数字字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我有一个示例df,其中的Text列包含0,1或> 1 ABC

I have a sample df with a Text column containing 0,1, or >1 ABC's

import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Mmith  ABC: 1111111 is this here', 
                                   'ABC: 1234567 Mary Lisa Rider found here', 
                                   'Jane A Doe is also here',
                                'ABC: 2222222 Tom T Tucker is here ABC: 2222222 too'], 

                      'P_ID': [1,2,3,4],
                      'N_ID' : ['A1', 'A2', 'A3', 'A4']

                     })

#rearrange columns
df = df[['Text','N_ID', 'P_ID']]
df

                            Text                      N_ID  P_ID
0   Jon J Mmith ABC: 1111111 is this here               A1  1
1   ABC: 1234567 Mary Lisa Rider found here             A2  2
2   Jane A Doe is also here                             A3  3
3   ABC: 2222222 Tom T Tucker is here ABC: 2222222...   A4  4  

目标

1)将Text列中的ABC数字(例如ABC: 1111111)更改为ABC: **BLOCK**

1) Change the ABC numbers in Text column (e.g ABC: 1111111) to ABC: **BLOCK**

2)创建一个包含此输出的新列Text_ABC

2) Create a new column Text_ABC containing this output

所需的输出

                             Text                  N_ID P_ID Text_ABC
0   Jon J Mmith ABC: 1111111 is this here          A1   1   Jon J Mmith ABC: **BLOCK** is this here
1   ABC: 1234567 Mary Lisa Rider found here        A2   2   ABC: **BLOCK** Mary Lisa Hider found here   
2   Jane A Doe is also here                        A3   3   Jane A Doe is also here 
3   ABC: 2222222 Tom T Tucker is here ABC: 2222222 A4   4   ABC: **BLOCK** Tom T Tucker is here ABC: **BLOCK**

问题

如何实现所需的输出?

推荐答案

如果要替换所有数字,则可以执行以下操作:

If all your numerics are to be replaced, you can do:

df['Text_ABC'] = df['Text'].replace(r'\d+', '***BLOCK***', regex=True)

但是,如果您想更具体一些,并且只替换ABC:之后的数字,则可以使用以下方法:

But if you want to be more specific and only replace the numerics after ABC:, then you can use this:

df['Text_ABC'] = df['Text'].replace(r'ABC: \d+', 'ABC: ***BLOCK***', regex=True)

给你

df
                                                Text  P_ID N_ID                                           Text_ABC
0             Jon J Smith  ABC: 1111111 is this here     1   A1           Jon J Smith  ABC: ***BLOCK*** is this here
1            ABC: 1234567 Mary Lisa Rider found here     2   A2          ABC: ***BLOCK*** Mary Lisa Rider found here
2                            Jane A Doe is also here     3   A3                            Jane A Doe is also here
3  ABC: 2222222 Tom T Tucker is here ABC: 2222222...     4   A4  ABC: ***BLOCK*** Tom T Tucker is here ABC: ***BLOCK...

作为正则表达式,\d+表示匹配一个或多个连续数字",因此请在 replace 说用***BLOCK***替换一个或多个连续数字"

As a regex, \d+ means "match one or more consecutive digits", so using that within replace says to "replace one or more consecutive digits with ***BLOCK***"

这篇关于更改 pandas 列中的数字字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 23:04