问题描述
您好,我有一个数据框,我想从这些短语开头或包含这些短语的每一行中删除一组特定的字符 fwd, re, RE。我面临的问题是我不知道如何为每种情况应用正则表达式。
Hello I have a dataframe where I want to remove a specific set of characters 'fwd', 're', 'RE' from every row that starts with these phrases or contains these phrases. The issue I am facing is that I do not know how to apply regex for each case.
我的数据帧如下:
summary
0 Fwd: Please look at the attached documents and take action
1 NSN for the ones who care
2 News for all team members
3 Fwd:RE:Re: Please take action on the action needed items
4 Fix all the mistakes please
5 Fwd:Re: Take action on the attachments in this email
6 Fwd:RE: Action is required
我想要这样的结果数据框:
I want a result dataframe like this:
summary
0 Please look at the attached documents and take action
1 NSN for the ones who care
2 News for all team members
3 Please take action on the action needed items
4 Fix all the mistakes please
5 Take action on the attachments in this email
6 Action is required
要摆脱'Fwd',我使用df ['msg']。str.replace(r'^ Fwd:','')
To get rid of 'Fwd' I used df['msg'].str.replace(r'^Fwd: ','')
推荐答案
如果可以的话在字符串中,可以使用重复模式:
If they can be anywhere in the string, you could use a repeating pattern:
^(?:(?:Fwd|R[eE]):)+\s*
-
^
字符串开头 -
(?:
非捕获组
-
(?: Fwd | R [eE]):
匹配Fwd,Re或RE ^
Start of string(?:
Non capturing group(?:Fwd|R[eE]):
match either Fwd, Re or RE
在替换中使用空字符串。
In the replacement use an empty string.
还可以使用
re.IGNORECASE
并使用(?: fwd | re)来使模式不区分大小写。
如果要匹配所有可能的变体。You could also make the pattern case insensitive using
re.IGNORECASE
and use(?:fwd|re)
if you want to match all possible variations.例如
str.replace(r'^(?:(?:Fwd|R[eE]):)+\s*','')
这篇关于如何将正则表达式应用于数据框列上的多个短语?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
-