问题描述
考虑一个预处理器,它将读取原始文本(没有明显的空白或标记).
Think about a preprocessor which will read the raw text (no significant white space or tokens).
有3条规则.
-
resolve_para_entry
应该在调用中解决参数.顶级文本作为字符串返回.
resolve_para_entry
should solve the Argument inside a call. The top-level text is returned as string.
resolve_para
应该解析整个参数列表,并将所有顶级参数放在字符串列表中.
resolve_para
should resolve the whole Parameter list and put all the top-level Parameter in a string list.
resolve
是条目
在跟踪迭代器并获取文本部分的方式上
On the way I track the iterator and get the text portion
样品:
-
sometext(para)
→在字符串列表中期望para
sometext(para)
→ expectpara
in the string list
sometext(para1,para2)
→在字符串列表中期望para1
和para2
sometext(para1,para2)
→ expect para1
and para2
in string list
sometext(call(a))
→在字符串列表中期望call(a)
sometext(call(a))
→ expect call(a)
in the string list
sometext(call(a,b))
←此处失败;它表明!lit(',')"不会将解析器带到外面..
sometext(call(a,b))
← here it fails; it seams that the "!lit(',')" wont take the Parser to step outside ..
规则:
resolve_para_entry = +(
(iter_pos >> lit('(') >> (resolve_para_entry | eps) >> lit(')') >> iter_pos) [_val= phoenix::bind(&appendString, _val, _1,_3)]
| (!lit(',') >> !lit(')') >> !lit('(') >> (wide::char_ | wide::space)) [_val = phoenix::bind(&appendChar, _val, _1)]
);
resolve_para = (lit('(') >> lit(')'))[_val = std::vector<std::wstring>()] // empty para -> old style
| (lit('(') >> resolve_para_entry >> *(lit(',') >> resolve_para_entry) > lit(')'))[_val = phoenix::bind(&appendStringList, _val, _1, _2)]
| eps;
;
resolve = (iter_pos >> name_valid >> iter_pos >> resolve_para >> iter_pos);
最后似乎不太优雅.也许有一种更好的方法来解析此类内容而无需使用船长
In the end doesn't seem very elegant. Maybe there is a better way to parse such stuff without skipper
推荐答案
实际上,这应该简单得多.
Indeed this should be a lot simpler.
首先,我看不到为什么船长的缺席与完全相关.
First off, I fail to see why the absense of a skipper is at all relevant.
第二,最好使用qi::raw[]
公开原始输入,而不是使用iter_pos
和笨拙的语义动作¹跳舞.
Second, exposing the raw input is best done using qi::raw[]
instead of dancing with iter_pos
and clumsy semantic actions¹.
在我看到的其他观察结果中:
Among the other observations I see:
- 否定字符集是通过
~
完成的,例如~char_(",()")
-
(p|eps)
的拼写更好-p
-
(lit('(') >> lit(')'))
可能只是"()"
(毕竟,没有船长,对吧) -
p >> *(',' >> p)
等同于p % ','
-
使用上述内容,
resolve_para
可以简化为:
- negating a charset is done with
~
, so e.g.~char_(",()")
(p|eps)
would be better spelled-p
(lit('(') >> lit(')'))
could be just"()"
(after all, there's no skipper, right)p >> *(',' >> p)
is equivalent top % ','
With the above,
resolve_para
simplifies to this:
resolve_para = '(' >> -(resolve_para_entry % ',') >> ')';
对我来说,
resolve_para_entry
似乎很奇怪.似乎所有嵌套的括号都被简单地吞下了.为什么不真正解析递归语法,以便检测语法错误?
resolve_para_entry
seems weird, to me. It appears that any nested parentheses are simply swallowed. Why not actually parse a recursive grammar so you detect syntax errors?
这是我的看法:
我更愿意将此作为第一步,因为它可以帮助我考虑解析器的产生:
I prefer to make this the first step because it helps me think about the parser productions:
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
创建语法规则
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
及其定义:
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
注意:
- 没有更多的语义动作
- 没有了
eps
- 没有
iter_pos
-
我选择使
arglist
为非可选.如果您真的想要,请改回来:
- No more semantic actions
- No more
eps
- No more
iter_pos
I've opted to make
arglist
not-optional. If you really wanted that, change it back:
resolve = identifier >> -arglist;
但是在我们的示例中,它将产生很多嘈杂的输出.
But in our sample it will generate a lot of noisy output.
当然,您的入口点(start
)将有所不同.我只是使用了Spirit Repository中的另一个方便的解析器指令(例如您已经在使用的iter_pos
)做了可能的最简单的事情:seek[]
Of course your entry point (start
) will be different. I just did the simplest thing that could possibly work, using another handy parser directive from the Spirit Repository (like iter_pos
that you were already using): seek[]
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, std::string()> arg, identifier;
};
#include <iostream>
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(
Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
for (auto& resolve: data) {
std::cout << " - " << resolve.name << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << arg << "\n";
}
std::cout << " )\n";
}
}
打印
Parsed 6 resolves
- sometext
(
para
)
- sometext
(
para1
para2
)
- sometext
(
call(a)
)
- call
(
a
)
- call
(
a
b
)
- lit
(
'
'
)
更多创意
最后一个输出显示您当前的语法有问题:lit(',')
显然不应被视为具有两个参数的调用.
More Ideas
That last output shows you a problem with your current grammar: lit(',')
should obviously not be seen as a call with two parameters.
我最近做了一个答案,它提取带有参数的(嵌套的)函数调用,可以使事情做得更整洁:
I recently did an answer on extracting (nested) function calls with parameters which does things more neatly:
- Boost spirit parse rule is not applied
- or this one boost spirit reporting semantic error
使用string_view
的奖金版本,还显示所有提取单词的确切行/列信息.
Bonus version that uses string_view
and also shows exact line/column information of all extracted words.
请注意,它仍然不需要任何凤凰或语义操作.取而代之的是,它仅定义了要从迭代器范围分配给boost::string_view
的必要特征.
Note that it still doesn't require any phoenix or semantic actions. Instead it simply defines the necesary trait to assign to boost::string_view
from an iterator range.
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
namespace Ast {
using Source = boost::string_view;
using ArgList = std::list<Source>;
struct Resolve {
Source name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static void call(It f, It l, boost::string_view& attr) {
attr = boost::string_view { f.base(), size_t(std::distance(f.base(),l.base())) };
}
};
} } }
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = raw [ char_("a-zA-Z_") >> *char_("a-zA-Z0-9_") ];
arg = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist = '(' >> -(arg % ',') >> ')';
resolve = identifier >> arglist;
start = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()> resolve;
qi::rule<It, Ast::ArgList()> arglist;
qi::rule<It, Ast::Source()> arg, identifier;
};
#include <iostream>
struct Annotator {
using Ref = boost::string_view;
struct Manip {
Ref fragment, context;
friend std::ostream& operator<<(std::ostream& os, Manip const& m) {
return os << "[" << m.fragment << " at line:" << m.line() << " col:" << m.column() << "]";
}
size_t line() const {
return 1 + std::count(context.begin(), fragment.begin(), '\n');
}
size_t column() const {
return 1 + (fragment.begin() - start_of_line().begin());
}
Ref start_of_line() const {
return context.substr(context.substr(0, fragment.begin()-context.begin()).find_last_of('\n') + 1);
}
};
Ref context;
Manip operator()(Ref what) const { return {what, context}; }
};
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(Samples:
sometext(para) → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a)) → expect call(a) in the string list
sometext(call(a,b)) ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolves\n";
} else {
std::cout << "Parsing failed\n";
}
Annotator annotate{samples};
for (auto& resolve: data) {
std::cout << " - " << annotate(resolve.name) << "\n (\n";
for (auto& arg : resolve.arglist) {
std::cout << " " << annotate(arg) << "\n";
}
std::cout << " )\n";
}
}
打印
Parsed 6 resolves
- [sometext at line:3 col:1]
(
[para at line:3 col:10]
)
- [sometext at line:4 col:1]
(
[para1 at line:4 col:10]
[para2 at line:4 col:16]
)
- [sometext at line:5 col:1]
(
[call(a) at line:5 col:10]
)
- [call at line:5 col:34]
(
[a at line:5 col:39]
)
- [call at line:6 col:10]
(
[a at line:6 col:15]
[b at line:6 col:17]
)
- [lit at line:6 col:62]
(
[' at line:6 col:66]
[' at line:6 col:68]
)
这篇关于不使用船长就可以增强精神分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!