This question already has answers here: How can I match nested brackets using regex?                                                                    (4个答案)                                                                                        5个月前关闭。                                命名实体识别新闻数据集(文本)这是一个示例:<LOC Qatar> and <LOC Japan>, who met in the <EVENT <S Asian> <E Cup>> final in <DATE February>, are in third place in their groups.我试图提取之间的实体,嵌套标签和输出中的问题是:['<LOC Qatar>', '<LOC Japan>', '<EVENT <S Asian>', '<E Cup>', '<DATE February>']这是错误的,因为“ EVENT S Asian”,“ E Cup”应该是一个字符串而不是两个。我尝试过regEx,但效果不佳。import res = """<LOC Qatar> and <LOC Japan>,who met in the <EVENT <S Asian> <E Cup>> final in <DATE February>, are in third place in their groups."""re.findall('\<.*?\>',s)实际结果:['<LOC Qatar>', '<LOC Japan>', '<EVENT <S Asian>', '<E Cup>', '<DATE February>']预期成绩:['<LOC Qatar>', '<LOC Japan>', '<EVENT <S Asian> <E Cup>>', '<DATE February>'] 最佳答案 您要应用注释中提到的递归模式。 regex模块给您机会(不是re模块)。这里的代码:# Import moduleimport regex as reg# Your strings = """<LOC Qatar> and <LOC Japan>,who met in the < EVENT < S Asian > < E Cup >> final in < DATE February > , are in third place in their groups. """# Match patternmy_list = reg.findall("<((?:[^<>]|(?R))*)>", s)print(my_list)# ['LOC Qatar', 'LOC Japan', ' EVENT < S Asian > < E Cup >', ' DATE February ']如果您真的想用<>包围这些单词,可以添加它们:my_list = ['<' + elt + '>' for elt in my_list]print(my_list)# ['<LOC Qatar>', '<LOC Japan>', '< EVENT < S Asian > < E Cup >>', '< DATE February >']关于python - 从符号“<>”和嵌套大小写“<< >>”之间的句子中提取单词。,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56737028/
10-15 12:18