本文介绍了替换单词和字符串 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
dataframe = pd.DataFrame({'Date':['This 1A1619 person BL171111 the A-1-24',
'dont Z112 but NOT 1-22-2001',
'mix: 1A25629Q88 or A13B ok'],
'IDs': ['A11','B22','C33'],
})
Date IDs
0 This 1A1619 person BL171111 the A-1-24 A11
1 dont Z112 but NOT 1-22-2001 B22
2 mix: 1A25629Q88 or A13B ok C33
我上面有数据框.我的目标是替换所有不带连字符-
的混合单词/数字组合,例如1A1619I
或BL171111
或A13B
,但不是带有字母M
的1-22-2001
或A-1-24
.我试图通过>识别字母/使用正则表达式并存储在字典中的数字组合
I have the dataframe above. My goal is to replace all mixed word/number combo's WITHOUT hyphens -
e.g. 1A1619I
or BL171111
or A13B
but NOT 1-22-2001
or A-1-24
with the letter M
. I have attempted to use the code below via identify letter/number combinations using regex and storing in dictionary
dataframe['MixedNum'] = dataframe['Date'].str.replace(r'(?=.*[a-zA-Z])(\S+\S+\S+)','M')
但是我得到了这个输出
Date IDs MixedNum
0 This 1A1619 person BL171111 the A-1-24 A11 M M M M M M M
1 dont Z112 but NOT 1-22-2001 B22 M M M M 1-22-2001
2 mix: 1A25629Q88 or A13B ok C33 M M or M ok
当我真的想要此输出时
Date IDs MixedNum
0 This 1A1619 person BL171111 the A-1-24 A11 This M person M the A-1-24
1 dont Z112 but NOT 1-22-2001 B22 dont M but NOT 1-22-2001
2 mix: 1A25629Q88 or A13B ok C33 mix: M or M ok
我也尝试过这里建议的正则表达式,但对我也没有用正则表达式替换混合数字和字符串
I also tried the regex suggested here but it also didnt work for meRegex replace mixed number+strings
有人可以帮我修改我的正则表达式吗? r'(?=.*[a-zA-Z])(\S+\S+\S+
Can anyone help me alter my regex? r'(?=.*[a-zA-Z])(\S+\S+\S+
推荐答案
您可以使用
pat = r'(?<!\S)(?:[a-zA-Z]+\d|\d+[a-zA-Z])[a-zA-Z0-9]*(?!\S)'
dataframe['MixedNum'] = dataframe['Date'].str.replace(pat, 'M')
输出:
>>> dataframe
Date IDs MixedNum
0 This 1A1619 person BL171111 the A-1-24 A11 This M person M the A-1-24
1 dont Z112 but NOT 1-22-2001 B22 dont M but NOT 1-22-2001
2 mix: 1A25629Q88 or A13B ok C33 mix: M or M ok
模式详细信息
-
(?<!\S)
-空格或字符串开头应紧邻当前位置 -
(?:[a-zA-Z]+\d|\d+[a-zA-Z])
-要么-
[a-zA-Z]+\d
-1个以上字母和一个数字 -
|
-或 -
\d+[a-zA-Z]
-1个以上的数字和一个字母
(?<!\S)
- a whitespace or start of string should immediately precede the current location(?:[a-zA-Z]+\d|\d+[a-zA-Z])
- either[a-zA-Z]+\d
- 1+ letters and a digit|
- or\d+[a-zA-Z]
- 1+ digits and a letter
这篇关于替换单词和字符串 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
-