问题描述
我使用各种正则表达式逐行解析C源文件.首先,我以字符串形式读取文件的所有内容:
I use a various regexes to parse a C source file, line by line. First i read all the content of file in a string:
ifstream file_stream("commented.cpp",ifstream::binary);
std::string txt((std::istreambuf_iterator<char>(file_stream)),
std::istreambuf_iterator<char>());
然后我使用一组正则表达式,应该连续应用它直到找到匹配项,这里我仅给出一个例子:
Then i use a set of regex, which should be applied continusly until the match found, here i will give only one for example:
vector<regex> rules = { regex("^//[^\n]*$") };
char * search =(char*)txt.c_str();
int position = 0, length = 0;
for (int i = 0; i < rules.size(); i++) {
cmatch match;
if (regex_search(search + position, match, rules[i],regex_constants::match_not_bol | regex_constants::match_not_eol))
{
position += ( match.position() + match.length() );
}
}
但是它不起作用.它将与不在当前行中的注释匹配,但是它将搜索整个字符串,对于第一个匹配, regex_constants :: match_not_bol
和 regex_constants :: match_not_eol
应该使 regex_search
只能将 ^ $
识别为行的开始/结束,而不是整个块的开始/结束.这是我的文件:
But it don't work. It will match the comment not in the current line, but it will search whole string, for the first match, regex_constants::match_not_bol
and regex_constants::match_not_eol
should make the regex_search
to recognize ^$
as start/end of line only, not end start/end of whole block. So here is my file:
commented.cpp:
#include <stdio.h>
//comment
代码应该失败,我的逻辑是使用regex_search的那些选项,匹配应该失败,因为它应该在第一行中搜索模式:
The code should fail, my logic is with those options to regex_search, the match should fail, because it should search for pattern in the first line:
#include <stdio.h>
但是,它搜索整个字符串,并立即找到//comment
.我需要帮助,以使 regex_search
仅在当前行中匹配.选项 match_not_bol
和 match_not_eol
对我没有帮助.当然,我可以在向量中逐行读取文件,然后对向量中的每个字符串进行所有规则的匹配,但是它非常慢,我这样做了,而且解析一个大文件需要花费很长时间.那就是为什么我要让正则表达式处理新行并使用定位计数器.
But instead it searches whole string, and immideatly finds //comment
. I need help, to make regex_search
match only in current line. The options match_not_bol
and match_not_eol
do not help me. Of course i can read a file line by line in a vector, and then do match of all rules on each string in vector, but it is very slow, i have done that, and it take too long time to parse a big file like that, that's why i want to let regex deal with new lines, and use positioning counter.
推荐答案
您正在做的事情不是使用正则表达式库的正确方法.
因此,这是我对任何想要使用 std :: regex
库的人的建议.
What you are doing is not a correct way of using a regex library.
Thus here is my suggestion for anyone that wants to use std::regex
library.
- 它仅支持
ECMAScript
,比所有现代的 regex 库都要差. -
它有尽可能多的错误(我发现):
- It only supports
ECMAScript
that somehow is a littlepoor than all modernregex
library. It has bugs as many as you like ( just I found ):
在某些情况下(我专门使用 std :: match_results
进行测试),与 std.regex
相比,速度要慢 200 倍.>以 d 语言
In some cases (I test specifically with std::match_results
) It is 200 times slower in comparison to std.regex
in d language
结论:根本不要使用它.
conclusion: do not use it at all.
但是,如果有人仍然要求使用c ++则您可以:
But if anyone still demands to use c++ anyway then you can:
-
使用
boost :: regex
,因为:
use
boost::regex
because:
- 这是
PCRE
支持 - 它的bug少(我没看过)
- 它在 bin 文件中较小(我是指编译后的可执行文件)
- 比
std :: regex
更快
- It is
PCRE
support - It has less bug ( I have not seen any )
- It is smaller in bin file ( I mean executable file after compiling )
- It is faster then
std::regex
使用下面的 gcc版本7.1.0
和否.我发现的最后一个错误是版本 6.3.0
use gcc version 7.1.0
and NOT below. The last bug I found is in version 6.3.0
如果您诱使(=说服)不,请使用 c ++ ,则可以使用:
If you have enticed (= persuade) to NOT use c++ then you can use:
-
使用 d 正则表达式的问题用于大型任务的库:
std.regex 代码>以及原因:
Use d regular expression library for large task:
std.regex
and why:
- 快速 中的线条工具
- 轻松
- 灵活的
使用本机 pcre
或 pcre2
="tag"> c
- 速度极快,但有点复杂
这篇关于std regex_search只匹配当前行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!