问题描述
grep
从命令行使用时不能输入原始"字符串,因为某些字符需要转义才能不被视为文字.例如:
grep
can't be fed "raw" strings when used from the command-line, since some characters need to be escaped to not be treated as literals. For example:
$ grep '(hello|bye)' # WON'T MATCH 'hello'
$ grep '(hello|bye)' # GOOD, BUT QUICKLY BECOMES UNREADABLE
我使用 printf
来自动转义字符串:
I was using printf
to auto-escape strings:
$ printf '%q' '(some|group)
'
(some|group)\n
这会生成字符串的 bash 转义版本,并使用反引号,可以轻松地将其传递给 grep 调用:
This produces a bash-escaped version of the string, and using backticks, this can easily be passed to a grep call:
$ grep `printf '%q' '(a|b|c)'`
然而,这显然不是为了这个:输出中的一些字符没有被转义,而有些则是不必要的.例如:
However, it's clearly not meant for this: some characters in the output are not escaped, and some are unnecessarily so. For example:
$ printf '%q' '(^#)'
(^#)
^
字符在传递给 grep
时不应转义.
The ^
character should not be escaped when passed to grep
.
是否有一个 cli 工具接受一个原始字符串并返回字符串的 bash 转义版本,它可以直接用作 grep 的模式?如果没有,我如何在纯 bash 中实现这一点?
Is there a cli tool that takes a raw string and returns a bash-escaped version of the string that can be directly used as pattern with grep? How can I achieve this in pure bash, if not?
推荐答案
如果你试图让 grep
使用扩展正则表达式语法,那么这样做的方法是使用 grep-E
(又名 egrep
).您还应该了解 grep -F
(又名 fgrep
),以及在较新版本的 GNU Coreutils 中,grep -P
.
If you are attempting to get grep
to use Extended Regular Expression syntax, the way to do that is to use grep -E
(aka egrep
). You should also know about grep -F
(aka fgrep
) and, in newer versions of GNU Coreutils, grep -P
.
背景:最初的grep
有一组相当小的正则表达式操作符;它是 Ken Thompson 最初的正则表达式实现.后来开发了具有扩展曲目的新版本,出于兼容性原因,使用了不同的名称.使用 GNU grep
,只有一个二进制文件,如果作为 grep
调用,它可以理解传统的基本 RE 语法,如果作为 egrep
调用,它可以理解 ERE.egrep
中的一些结构可以在 grep
中使用,通过使用反斜杠转义来引入特殊含义.
Background: The original grep
had a fairly small set of regex operators; it was Ken Thompson's original regular expression implementation. A new version with an extended repertoire was developed later, and for compatibility reasons, got a different name. With GNU grep
, there is only one binary, which understands the traditional, basic RE syntax if invoked as grep
, and ERE if invoked as egrep
. Some constructs from egrep
are available in grep
by using a backslash escape to introduce special meaning.
随后,Perl 编程语言进一步扩展了形式主义;大多数新人错误地期望 grep
也支持这种正则表达式方言.使用 grep -P
,它可以;但这尚未在所有平台上得到广泛支持.
Subsequently, the Perl programming language has extended the formalism even further; this regex dialect seems to be what most newcomers erroneously expect grep
, too, to support. With grep -P
, it does; but this is not yet widely supported on all platforms.
所以,在grep
中,以下字符有特殊的含义:^$[]*.
So, in grep
, the following characters have a special meaning: ^$[]*.
在egrep
中,以下字符也有特殊含义:()|+?{}
.(用于重复的大括号不在原来的 egrep
中.)分组括号还可以使用 1
、2
等进行反向引用
In egrep
, the following characters also have a special meaning: ()|+?{}
. (The braces for repetition were not in the original egrep
.) The grouping parentheses also enable backreferences with 1
, 2
, etc.
在许多版本的 grep
中,您可以通过在 egrep
特殊字符之前放置一个反斜杠来获得 egrep
行为.还有一些特殊的序列,比如 .
In many versions of
grep
, you can get the egrep
behavior by putting a backslash before the egrep
specials. There are also special sequences like <>
.
在 Perl 中,引入了大量额外的转义,如
w
s
d
.在 Perl 5 中,正则表达式功能得到了显着扩展,具有非贪婪匹配 *?
+?
等,非分组括号 (?:...)
、前瞻、后视等
In Perl, a huge number of additional escapes like
w
s
d
were introduced. In Perl 5, the regex facility was substantially extended, with non-greedy matching *?
+?
etc, non-grouping parentheses (?:...)
, lookaheads, lookbehinds, etc.
...话虽如此,如果您真的想将
egrep
正则表达式转换为 grep
正则表达式 而无需调用任何外部进程, 为每个 egrep
特殊字符尝试 ${regex/pattern/substitution}
;但请注意,这不能正确处理字符类、否定字符类或反斜杠转义.
... Having said that, if you really do want to convert
egrep
regular expressions to grep
regular expressions without invoking any external process, try ${regex/pattern/substitution}
for each of the egrep
special characters; but recognize that this does not handle character classes, negated character classes, or backslash escapes correctly.
这篇关于有没有一种简单的方法来传递“原始"文件?字符串到grep?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!