问题描述
我正在开发一个相当标准的编译器项目,我选择了 ANTLR 作为解析器生成器.在将现有语法从 v2 更新到 v3 时,我注意到 ANTLRWorks(ANTLR 的官方 IDE)没有正确显示文件中的任何扩展 ASCII 字符.即使在使用 Notepad++ 将文件从 ASCII 转换为 UTF8 之后,它仍然将这些字符显示为正方形.在 Notepad++ 中,它们显示良好.
I'm working on a fairly standard compiler project for which I picked ANTLR as the parser-generator. While updating an existing grammar from v2 to v3 I noticed that ANTLRWorks, the official IDE for ANTLR, wasn't displaying any of the extended-ASCII characters in the file properly. Even after using Notepad++ to convert the file to UTF8 from ASCII did it still display those characters as squares. In Notepad++ they display fine.
由于这个故障意味着 ANTLRWorks 在我保存文件时会破坏文件,因此我不能再将其用作编辑器,这很烦人.这里有没有其他人遇到过这个问题,也许解决了?非常感谢.
Since this glitch means that ANTLRWorks mauls the file when I save it I can not use it as an editor any more, which is rather annoying. Has anyone else here encountered this issue and maybe solved it? Much obliged.
[编辑]:特定问题出现在最新版本的 ANTLRWorks(昨天下载)和我从 http://www.antlr.org/grammar/1086696923011/vhdlams/index.html
[edit]: the specific issue occurs with the latest version of ANTLRWorks (downloaded it yesterday) and with the vams.g grammar file I got from http://www.antlr.org/grammar/1086696923011/vhdlams/index.html
推荐答案
我无法使用 ANTLRWorks 1.4.3 重现此问题.
I cannot reproduce this with ANTLRWorks 1.4.3.
如果我创建一个虚拟语法:
If I create a dummy grammar:
grammar T;
parse : . ;
Any : . ;
并在多行注释中粘贴完整的扩展 ASCII 集:
and paste the complete extended ASCII set in a multi-line comment:
grammar T;
/*
€
‚
ƒ
...
ÿ
*/
parse : . ;
Any : . ;
没问题.我是用 ANTLRWorks 复制字符,还是用普通编辑器复制字符,然后用 ANTLRWorks 编辑现有语法都没有关系:在 ANTLRWorks 中保存后,字符都保持不变.
there's no problem. It doesn't matter if I copy the chars with ANTLRWorks, or with a normal editor and then edit the existing grammar with ANTLRWorks: the characters all stay the same after saving inside ANTLRWorks.
相关说明:ANTLR 3.0 到 3.3 版本仍然与 ANTLR 2.7 类有一些依赖关系,这可能导致 org.antlr.Tool
跳过 ASCII 集之外的某些字符.在这种情况下使用 ANTLR 3.4,它不再具有这些旧的依赖项.
On a related note: the versions ANTLR 3.0 to 3.3 still have some dependencies with ANTLR 2.7 classes which might cause the org.antlr.Tool
to trip over certain characters outside the ASCII set. Use ANTLR 3.4 in that case, which doesn't have these old dependencies anymore.
我怀疑原始语法中有一些奇怪的字节导致了所有的混乱.我很快只复制了原始语法中的规则,将所有 v2.7 语法更改为 v3 语法(将双引号文字更改为单引号,protected
变为 fragment
并注释了一些自定义代码)并将其保存在一个新文件中.这个文件可以被 ANTLRWorks 或纯文本编辑器打开(和保存),而不会导致它破坏扩展的 ASCII 字符.
I suspect there's some odd byte in the original grammar somewhere that is causing all the mayhem. I quickly copied only the rules from the original grammar, changed all v2.7 syntax to v3 syntax (changed double quoted literals to single quoted ones, protected
became fragment
and commented some custom code) and saved it in a new file. This file could be opened (and saved) by ANTLRWorks or a plain text editor without causing it to mangle the extended ASCII chars.
这是上述语法的 ANTLR v3 版本:http://pastebin.com/zU4xcvXt(语法太大了,无法在 SO 上发布...)
Here is the ANTLR v3 version of said grammar: http://pastebin.com/zU4xcvXt (the grammar is too big to post on SO...)
语法名称除了给它一个标签之外还有其他用途吗?
不,不是.正如您所提到的,它仅用于为解析器或词法分析器命名.
No, it's not. It's, as you mentioned, only used to give a parser or lexer a name.
ANTLR 中有 4 种语法:
There are 4 types of grammars in ANTLR:
- 组合语法,类似于
grammar T;
,生成TLexer.java
和TParser.java
源文件; - 解析器语法,类似于
解析器语法TP;
,生成一个TP.java
源文件; - lexer 语法,看起来像
lexer 语法 TL;
,生成一个TL.java
源文件; - 树语法,看起来像
树语法TWalker
,生成一个TWalker.java
源文件.
- combined grammar, which looks like
grammar T;
, generatingTLexer.java
andTParser.java
source files; - parser grammar, looking like
parser grammar TP;
, generating aTP.java
source file; - lexer grammar, looking like
lexer grammar TL;
, generating aTL.java
source file; - tree grammar, looking like
tree grammar TWalker
, generating aTWalker.java
source file.
这篇关于ANTLRWorks 1.4.3 无法正确读取扩展 ASCII 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!