问题描述
我有下面的示例数据之类的数据,并且我正在尝试模式匹配并解析它以创建类似输出数据的内容.这个想法是,如果我有一个包含"Aggr(")的字符串值,则解析括号中的"stuff",然后解析下一个括号之前逗号后面的东西".是否有一种精巧的方法可以做到这一点?像正则表达式一样,还是需要几个循环?
I have data like the sample data below, and I'm trying to pattern match and parse it to create something like the output data. The idea is, if I have a string value that contains "Aggr(" then parse the "stuff" in the parenthesis and the parse the following "something" that follows the comma before the next parenthesis. Is there a slick way to do this with like regex, or is it going to require a couple of loops?
Sample Data:
SampleDf=pd.DataFrame([['tom',"words Aggr(stuff),something1)"],['bob',"Morewords Aggr(Diffstuff),something2"]],columns=['ReportField','OtherField'])
Sample Output:
OutputDf=pd.DataFrame([['tom',"words Aggr(stuff),something1",'stuff', 'something1'],['bob',"Morewords Aggr(Diffstuff),something2",'Diffstuff','something2']],columns=['ReportField','OtherField','Part1','Part2'])
推荐答案
您可以使用str.extract
捕获字符串中的模式并将每个模式转换为一列:
You can use str.extract
to capture pattern in the string and convert each into a column:
pd.concat([
SampleDf,
SampleDf.OtherField.str.extract(r"Aggr\((?P<Part1>.*?)\),(?P<Part2>[^\(]*)", expand=True)
], axis=1)
# ReportField OtherField Part1 Part2
#0 tom words Aggr(stuff),something1 stuff something1
#1 bob Morewords Aggr(Diffstuff),something2 Diffstuff something2
regex Aggr\\((?P<Part1>.*?)\\),(?P<Part2>[^\\(]*)
捕获您需要的两种模式(一个名为 part1 的Aggr\\((?P<Part1>.*?)\\)
:在 Aggr ,另一个是,(?P<Part2>[^\\(]*)
,名称为 part2 :逗号后的模式(在下一个括号之前的第一个模式之后).
regex Aggr\\((?P<Part1>.*?)\\),(?P<Part2>[^\\(]*)
captures two patterns you needed (with one being Aggr\\((?P<Part1>.*?)\\)
named part1: the content in the first parenthesis after Aggr, another being ,(?P<Part2>[^\\(]*)
named part2: the pattern after the comma following the first pattern before the next parenthesis).
这篇关于从字符串中选择多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!