问题描述
背景
我有一个示例df
,其中的Text
列包含0,1或> 1 ABC
的
I have a sample df
with a Text
column containing 0,1, or >1 ABC
's
import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Mmith ABC: 1111111 is this here',
'ABC: 1234567 Mary Lisa Rider found here',
'Jane A Doe is also here',
'ABC: 2222222 Tom T Tucker is here ABC: 2222222 too'],
'P_ID': [1,2,3,4],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
#rearrange columns
df = df[['Text','N_ID', 'P_ID']]
df
Text N_ID P_ID
0 Jon J Mmith ABC: 1111111 is this here A1 1
1 ABC: 1234567 Mary Lisa Rider found here A2 2
2 Jane A Doe is also here A3 3
3 ABC: 2222222 Tom T Tucker is here ABC: 2222222... A4 4
目标
1)将Text
列中的ABC
数字(例如ABC: 1111111
)更改为ABC: **BLOCK**
1) Change the ABC
numbers in Text
column (e.g ABC: 1111111
) to ABC: **BLOCK**
2)创建一个包含此输出的新列Text_ABC
2) Create a new column Text_ABC
containing this output
所需的输出
Text N_ID P_ID Text_ABC
0 Jon J Mmith ABC: 1111111 is this here A1 1 Jon J Mmith ABC: **BLOCK** is this here
1 ABC: 1234567 Mary Lisa Rider found here A2 2 ABC: **BLOCK** Mary Lisa Hider found here
2 Jane A Doe is also here A3 3 Jane A Doe is also here
3 ABC: 2222222 Tom T Tucker is here ABC: 2222222 A4 4 ABC: **BLOCK** Tom T Tucker is here ABC: **BLOCK**
问题
如何实现所需的输出?
推荐答案
如果要替换所有数字,则可以执行以下操作:
If all your numerics are to be replaced, you can do:
df['Text_ABC'] = df['Text'].replace(r'\d+', '***BLOCK***', regex=True)
但是,如果您想更具体一些,并且只替换ABC:
之后的数字,则可以使用以下方法:
But if you want to be more specific and only replace the numerics after ABC:
, then you can use this:
df['Text_ABC'] = df['Text'].replace(r'ABC: \d+', 'ABC: ***BLOCK***', regex=True)
给你
df
Text P_ID N_ID Text_ABC
0 Jon J Smith ABC: 1111111 is this here 1 A1 Jon J Smith ABC: ***BLOCK*** is this here
1 ABC: 1234567 Mary Lisa Rider found here 2 A2 ABC: ***BLOCK*** Mary Lisa Rider found here
2 Jane A Doe is also here 3 A3 Jane A Doe is also here
3 ABC: 2222222 Tom T Tucker is here ABC: 2222222... 4 A4 ABC: ***BLOCK*** Tom T Tucker is here ABC: ***BLOCK...
作为正则表达式,\d+
表示匹配一个或多个连续数字",因此请在 replace
说用***BLOCK***
替换一个或多个连续数字"
As a regex, \d+
means "match one or more consecutive digits", so using that within replace
says to "replace one or more consecutive digits with ***BLOCK***
"
这篇关于更改 pandas 列中的数字字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!