本文介绍了ParseKit:我的文法应该使用什么内置制作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始使用ParseKit来探索语言的创建,也许还可以构建一个小型玩具DSL.但是,当前来自Google的SVN主干在解析此语法时会抛出-[PKToken intValue]: unrecognized selector sent to instance ...:

I just started using ParseKit to explore language creation and perhaps build a small toy DSL. However, the current SVN trunk from Google is throwing a -[PKToken intValue]: unrecognized selector sent to instance ... when parsing this grammar:

@start = identifier ;
identifier = (Letter | '_') | (letterOrDigit | '_') ;
letterOrDigit = Letter | Digit ;

针对此输入:

foo

很显然,我缺少某些内容或错误地配置了我的项目.我该怎么做才能解决此问题?

Clearly, I am missing something or have incorrectly configured my project. What can I do to fix this issue?

推荐答案

此处 ParseKit 的开发人员.

首先,请参见 ParseKit标记化文档.

基本上,ParseKit可以在以下两种模式之一中工作:让我们将它们称为Tokens ModeChars Mode. (这两种模式没有正式名称,但也许应该有.)

Basically, ParseKit can work in one of two modes: Let's call them Tokens Mode and Chars Mode. (There are no formal names for these two modes, but perhaps there should be.)

Tokens Mode更受欢迎.实际上,您会发现使用ParseKit的每个示例都将展示如何使用Tokens Mode.我相信 http://parsekit.com 上的所有文档都在使用Tokens Mode. ParseKit的语法功能(您在示例中使用的语法功能仅在Tokens Mode中有效).

Tokens Mode is more popular by far. Virtually every example you will find of using ParseKit will show how to use Tokens Mode. I believe all of the documentation on http://parsekit.com is using Tokens Mode. ParseKit's grammar feature (that you are using in your example only works in Tokens Mode).

Chars Mode是ParseKit鲜为人知的功能.我以前从未有人问过这个问题.

Chars Mode is a very little-known feature of ParseKit. I've never had anyone ask about it before.

所以模式上的差异是:

  • Tokens Mode中,ParseKit令牌生成器发出多字符令牌(例如单词,符号,数字,QuotedStrings等),然后由您创建的ParseKit解析器对其进行解析(以编程方式或通过语法)
  • Chars Mode中,ParseKit令牌生成器始终会发出单字符令牌,然后由您以编程方式创建的ParseKit解析器对其进行解析. (语法目前无法在此模式下使用,因为该模式并不流行).
  • In Tokens Mode, the ParseKit Tokenizer emits multi-char tokens (like Words, Symbols, Numbers, QuotedStrings etc) which are then parsed by the ParseKit parsers you create (programmatically or via grammars).
  • In Chars Mode, the ParseKit Tokenizer always emits single-char tokens which are then parsed by the ParseKit parsers you create programmatically. (grammars don't currently work with this mode as this mode is not popular).

您可以使用Chars Mode来实现正则表达式,以逐个字符为基础进行解析.

You could use Chars Mode to implement Regular Expresions which parse on a char-by-char basis.


对于您的示例,您应该忽略Chars Mode,而仅使用Tokens Mode.以下内置作品仅用于Chars Mode.不要在语法中使用它们:


For your example, you should be ignoring Chars Mode and just use Tokens Mode. The following Built-in Productions are for Chars Mode only. Do not use them in your grammars:

(PK)Letter
(PK)Digit
(PK)Char
(PK)SpecificChar 

请注意所有这些Productions听起来像它们与单个字符匹配.那是因为他们这样做.

Notice how all of those Productions sound like they match individual chars. That's because they do.

您上面的示例可能看起来像:

Your example above should probably look like:

@start = identifier;
identifier = Word; // by default Words start with a-zA-Z_ and contain -0-9a-zAZ_'

请记住,语法(例如identifier之类的语法)中的Productions将使用已经从ParseKit的分词器发出的令牌.不是单个字符.

Keep in mind the Productions in your grammars (parsers like identifier) will be working on Tokens already emitted from ParseKit's Tokenizer. Not individual chars.

IOW:等到您的语法开始解析输入时,输入已被标记为Word,Number,Symbol,QuotedString等类型的Token.

IOW: by the time your grammar goes to work parsing input, the input has already been tokenized into Tokens of type Word, Number, Symbol, QuotedString, etc.

以下是所有可用于语法的内置作品:

Here are all of the Built-in Productions available for use in your Grammar:

Word
Number 
Symbol
QuotedString
Comment
Any
S // Whitespace. only available when @preservesWhitespaceTokens=YES. NO by default.

也:

DelimitedString('start', 'end', 'allowedCharset')
/xxx/i // RegEx match

也有复合解析器的运算符:

There are also operators for composite parsers:

  // Sequence
| // Alternation
? // Optional
+ // Multiple
* // Repetition
~ // Negation
& // Intersection
- // Difference

这篇关于ParseKit:我的文法应该使用什么内置制作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-27 16:13