问题描述
我正在从事一个相当标准的编译器项目,为此我选择了ANTLR作为解析器生成器.在将现有语法从v2更新到v3时,我注意到ANTLRWorks(ANTLR的官方IDE)没有在文件中正确显示任何扩展的ASCII字符.即使在使用Notepad ++将文件从ASCII转换为UTF8之后,它仍然将那些字符显示为正方形.在Notepad ++中,它们显示良好.
I'm working on a fairly standard compiler project for which I picked ANTLR as the parser-generator. While updating an existing grammar from v2 to v3 I noticed that ANTLRWorks, the official IDE for ANTLR, wasn't displaying any of the extended-ASCII characters in the file properly. Even after using Notepad++ to convert the file to UTF8 from ASCII did it still display those characters as squares. In Notepad++ they display fine.
由于此故障意味着ANTLRWorks在保存文件时会损坏该文件,因此无法再将其用作编辑器,这很烦人.这里有没有其他人遇到过这个问题,也许已经解决了?非常有义务.
Since this glitch means that ANTLRWorks mauls the file when I save it I can not use it as an editor any more, which is rather annoying. Has anyone else here encountered this issue and maybe solved it? Much obliged.
[编辑]:具体问题出现在最新版本的ANTLRWorks(昨天下载)和我从 http://www.antlr.org/grammar/1086696923011/vhdlams/index.html
[edit]: the specific issue occurs with the latest version of ANTLRWorks (downloaded it yesterday) and with the vams.g grammar file I got from http://www.antlr.org/grammar/1086696923011/vhdlams/index.html
推荐答案
我无法在ANTLRWorks 1.4.3中重现它.
I cannot reproduce this with ANTLRWorks 1.4.3.
如果我创建了虚拟语法:
If I create a dummy grammar:
grammar T;
parse : . ;
Any : . ;
并在多行注释中粘贴完整的扩展ASCII集:
and paste the complete extended ASCII set in a multi-line comment:
grammar T;
/*
€
‚
ƒ
...
ÿ
*/
parse : . ;
Any : . ;
没问题.不管是使用ANTLRWorks复制字符还是使用普通编辑器复制字符,然后使用ANTLRWorks编辑现有语法,这些字符在保存到ANTLRWorks中后都保持不变.
there's no problem. It doesn't matter if I copy the chars with ANTLRWorks, or with a normal editor and then edit the existing grammar with ANTLRWorks: the characters all stay the same after saving inside ANTLRWorks.
相关说明:ANTLR 3.0至3.3版本仍与ANTLR 2.7类具有某些依赖性,这可能会导致org.antlr.Tool
越过ASCII集之外的某些字符.在这种情况下,请使用ANTLR 3.4,它不再具有这些旧的依赖项.
On a related note: the versions ANTLR 3.0 to 3.3 still have some dependencies with ANTLR 2.7 classes which might cause the org.antlr.Tool
to trip over certain characters outside the ASCII set. Use ANTLR 3.4 in that case, which doesn't have these old dependencies anymore.
我怀疑原始语法中某个奇数字节会引起所有混乱.我很快只复制了原始语法中的规则,将所有v2.7语法更改为v3语法(将双引号文字更改为单引号,protected
变为fragment
并注释了一些自定义代码)并将其保存在新文件中.可以通过ANTLRWorks或纯文本编辑器打开(保存)此文件,而不会导致扩展名ASCII字符损坏.
I suspect there's some odd byte in the original grammar somewhere that is causing all the mayhem. I quickly copied only the rules from the original grammar, changed all v2.7 syntax to v3 syntax (changed double quoted literals to single quoted ones, protected
became fragment
and commented some custom code) and saved it in a new file. This file could be opened (and saved) by ANTLRWorks or a plain text editor without causing it to mangle the extended ASCII chars.
以下是所述语法的ANTLR v3版本: http://pastebin.com/zU4xcvXt (语法太大而无法在SO上发布...)
Here is the ANTLR v3 version of said grammar: http://pastebin.com/zU4xcvXt (the grammar is too big to post on SO...)
不,不是.正如您所提到的,它仅用于为解析器或词法分析器命名.
No, it's not. It's, as you mentioned, only used to give a parser or lexer a name.
ANTLR中有4种语法:
There are 4 types of grammars in ANTLR:
- 组合语法,看起来像
grammar T;
,生成TLexer.java
和TParser.java
源文件; - 解析语法,类似于
parser grammar TP;
,生成一个TP.java
源文件; - lexer语法,看起来像
lexer grammar TL;
,生成一个TL.java
源文件; - 树语法,类似于
tree grammar TWalker
,生成一个TWalker.java
源文件.
- combined grammar, which looks like
grammar T;
, generatingTLexer.java
andTParser.java
source files; - parser grammar, looking like
parser grammar TP;
, generating aTP.java
source file; - lexer grammar, looking like
lexer grammar TL;
, generating aTL.java
source file; - tree grammar, looking like
tree grammar TWalker
, generating aTWalker.java
source file.
这篇关于ANTLRWorks 1.4.3无法正确读取扩展的ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!