问题描述
阅读乔木斯基层次结构 ......我知道regexp不能解析2型语法(无上下文语法),也不能解析1型和0型语法.正则表达式可以解析/捕获所有3类语法(常规语法)吗?
Reading Chomsky hierarchy ...... I know regexp can't parse type-2 grammars (context-free grammars), and also type-1 and type-0.Can regular expressions parse/catch ALL type-3 grammars (regular grammars)?
推荐答案
是的,只要它们支持交替,串联和Kleene星. PCRE(Perl/Java/JavaScript/PHP/...)类型的正则表达式就是这种情况:交替由((...)|(...))
实现,串联由(...)(...)
实现,而Kleene star由(...)*
实现. (在大多数这些语言中,还有一些其他细节—您需要使用\A
和\z
之类的东西来表示字符串开始"和字符串结束",以常规语法表示被认为是理所当然的—就是这个主意.)
Yes, provided they support alternation, concatenation, and the Kleene star. This is the case for regexes of the PCRE (Perl/Java/JavaScript/PHP/...) type: alternation is implemented by ((...)|(...))
, concatenation by (...)(...)
, and the Kleene star by (...)*
. (There are a few other details — in most of these languages you need to use something like \A
and \z
to indicate "start-of-string" and "end-of-string", which in a regular grammar is taken for granted — but that's the idea.)
但是在编程环境中并不是所有被称为正则表达式"的东西都必须具有上述所有内容;例如, POSIX基本正则表达式仅支持非常有限的交替形式,其中所有交替的分支"由一个字符组成(例如,PCRE同时具有(a|b|c)
和特例等效的[abc]
,而POSIX BRE仅具有[abc]
,因此不能表示类似(ab|c)
的内容)
But not everything called a "regular expression" in a programming context necessarily has all of the above; for example, POSIX Basic Regular Expressions supports only a very limited form of alternation, where all "branches" of the alternation consist of a single character (e.g., whereas PCREs has both (a|b|c)
and the special-case-equivalent [abc]
, POSIX BREs only have [abc]
, so can't express something like (ab|c)
).
这篇关于正则表达式解析3型语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!