问题描述
我想知道两个已知的正则表达式之间是否可能存在冲突,以允许用户构建一个互斥正则表达式的列表.
例如,我们知道下面的正则表达式非常不同,但它们都匹配xy50
:
'^xy1\d''[^\d]\d2$'
是否可以使用计算机算法确定两个正则表达式是否会发生这样的冲突?怎么样?
这里不涉及停机问题.您只需要计算 ^xy1\d
和 [^\d]\d2$
的交集是否为非空.
我不能在这里给你一个算法,但这里有两个关于不用构建 DFA 来生成交集的方法的讨论:
而空的路口:
('xy1' 数字任意*) &('q' any* ^digit 数字 '2')
看起来像这样:
因此,如果所有其他方法都失败,那么您仍然可以通过比较生成的点"文件,让 Ragel 计算交集并检查它是否输出空状态机.
I want to find out if there could ever be conflicts between two known regular expressions, in order to allow the user to construct a list of mutually exclusive regular expressions.
For example, we know that the regular expressions below are quite different but they both match
xy50
:'^xy1\d' '[^\d]\d2$'
Is it possible to determine, using a computer algorithm, if two regular expressions can have such a conflict? How?
解决方案There's no halting problem involved here. All you need is to compute if the intersection of
^xy1\d
and[^\d]\d2$
in non-empty.I can't give you an algorithm here, but here are two discussions of a method to generate the intersection without resorting the construction of a DFA:
And then there's RAGEL
which can compute the intersection of regular expressions too.
UPDATE: I just tried out Ragel with OP's regexp. Ragel can generate a "dot" file for graphviz from the resulting state machine, which is terrific. The intersection of the OP's regexp looks like this in Ragel syntax:
('xy1' digit any*) & (any* ^digit digit '2')
and has the following state machine:
While the empty intersection:
('xy1' digit any*) & ('q' any* ^digit digit '2')
looks like this:
So if all else fails, then you can still have Ragel compute the intersection and check if it outputs the empty state machine, by comparing the generated "dot" file.
这篇关于正则表达式:确定两个正则表达式是否可以匹配同一个输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!