std regex_search只匹配当前行 | search只匹配当前行

本文介绍了std regex_search只匹配当前行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用各种正则表达式逐行解析C源文件.首先，我以字符串形式读取文件的所有内容:

I use a various regexes to parse a C source file, line by line. First i read all the content of file in a string:

ifstream file_stream("commented.cpp",ifstream::binary);

std::string txt((std::istreambuf_iterator<char>(file_stream)),
std::istreambuf_iterator<char>());

然后我使用一组正则表达式，应该连续应用它直到找到匹配项，这里我仅给出一个例子:

Then i use a set of regex, which should be applied continusly until the match found, here i will give only one for example:

vector<regex> rules = { regex("^//[^\n]*$") };

char * search =(char*)txt.c_str();

int position = 0, length = 0;

for (int i = 0; i < rules.size(); i++) {
  cmatch match;

  if (regex_search(search + position, match, rules[i],regex_constants::match_not_bol | regex_constants::match_not_eol))
  {
     position += ( match.position() + match.length() );
  }

}

但是它不起作用.它将与不在当前行中的注释匹配，但是它将搜索整个字符串，对于第一个匹配， regex_constants :: match_not_bol 和 regex_constants :: match_not_eol 应该使 regex_search 只能将 ^ $ 识别为行的开始/结束，而不是整个块的开始/结束.这是我的文件:

But it don't work. It will match the comment not in the current line, but it will search whole string, for the first match, regex_constants::match_not_bol and regex_constants::match_not_eol should make the regex_search to recognize ^$ as start/end of line only, not end start/end of whole block. So here is my file:

commented.cpp:

#include <stdio.h>
//comment

代码应该失败，我的逻辑是使用regex_search的那些选项，匹配应该失败，因为它应该在第一行中搜索模式:

The code should fail, my logic is with those options to regex_search, the match should fail, because it should search for pattern in the first line:

#include <stdio.h>

但是，它搜索整个字符串，并立即找到//comment .我需要帮助，以使 regex_search 仅在当前行中匹配.选项 match_not_bol 和 match_not_eol 对我没有帮助.当然，我可以在向量中逐行读取文件，然后对向量中的每个字符串进行所有规则的匹配，但是它非常慢，我这样做了，而且解析一个大文件需要花费很长时间.那就是为什么我要让正则表达式处理新行并使用定位计数器.

But instead it searches whole string, and immideatly finds //comment. I need help, to make regex_search match only in current line. The options match_not_bol and match_not_eol do not help me. Of course i can read a file line by line in a vector, and then do match of all rules on each string in vector, but it is very slow, i have done that, and it take too long time to parse a big file like that, that's why i want to let regex deal with new lines, and use positioning counter.

推荐答案

您正在做的事情不是使用正则表达式库的正确方法.
因此，这是我对任何想要使用 std :: regex 库的人的建议.

What you are doing is not a correct way of using a regex library.
Thus here is my suggestion for anyone that wants to use std::regex library.

它仅支持 ECMAScript ，比所有现代的 regex 库都要差.
它有尽可能多的错误(我发现):

It only supports ECMAScript that somehow is a littlepoor than all modern regex library.
It has bugs as many as you like ( just I found ):

相同的正则表达式却不同仅在Linux和Windows C ++上运行结果
std :: regex和忽略标志
std :: regex_match和具有奇怪行为的懒惰量词

在某些情况下(我专门使用 std :: match_results 进行测试)，与 std.regex 相比，速度要慢 200 倍.>以 d 语言

In some cases (I test specifically with std::match_results ) It is 200 times slower in comparison to std.regex in d language

结论:根本不要使用它.

conclusion: do not use it at all.

但是，如果有人仍然要求使用c ++则您可以:

But if anyone still demands to use c++ anyway then you can:

使用 boost :: regex ，因为:

use boost::regex because:

这是 PCRE 支持
它的bug少(我没看过)
它在 bin 文件中较小(我是指编译后的可执行文件)
比 std :: regex

It is PCRE support
It has less bug ( I have not seen any )
It is smaller in bin file ( I mean executable file after compiling )
It is faster then std::regex

使用下面的 gcc版本7.1.0 和否.我发现的最后一个错误是版本 6.3.0

use gcc version 7.1.0 and NOT below. The last bug I found is in version 6.3.0

如果您诱使(=说服)不，请使用 c ++ ，则可以使用:

If you have enticed (= persuade) to NOT use c++ then you can use:

使用 d 正则表达式的问题用于大型任务的库: std.regex 以及原因:

Use d regular expression library for large task: std.regex and why:

快速

中的线条工具轻松
灵活的

 使用本机 pcre 或 pcre2  ="tag"> c  
速度极快，但有点复杂
                        这篇关于std regex_search只匹配当前行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！