问题描述
我正在创建一个用于查找共同引用文本的 GATE 应用程序.它工作正常,我已经通过 GATE 中提供的导出选项创建了应用程序的压缩文件.
I'm creating a GATE app which used to find co-reference text. It works fine and I have created zipped file of the app by export option provided in GATE.
现在我正在尝试在我的 Java 代码中使用相同的内容.
Now I'm trying to use the same in my Java code.
Gate.runInSandbox(true);
Gate.setGateHome(new File(gateHome));
Gate.setPluginsHome(new File(gateHome, "plugins"));
Gate.init();
URL applicationURL = new URL("file:" + new Path(gateHome, "application.xgapp").toString());
application = (CorpusController) PersistenceManager.loadObjectFromUrl(applicationURL);
corpus = Factory.newCorpus("Megaki Corpus");
application.setCorpus(corpus);
Document document = Factory.newDocument(text);
corpus.add(document);
application.execute();
corpus.clear();
现在我如何解析这个文档并获得共同引用文本?
Now how can I parse this document and get co-reference text?
推荐答案
我不知道您的情况,但使用共同引用编辑器手动创建的共同引用存储在文档功能中.特征名称似乎是 "MatchesAnnots"
和类型 Map>>
.
I do not know about yours, but co-references created manually using the Co-reference Editor are stored in a document feature. The feature name seems to be "MatchesAnnots"
and the type Map<String, List<List<Integer>>>
.
在我的例子中,下面的代码打印 as name: null
(默认的注释集),后面跟着所有的共同引用链.
In my case, following code prints as name: null
(the default annotation set) followed by all co-reference chains present in it.
Object obj = document.getFeatures().get("MatchesAnnots");
@SuppressWarnings("unchecked")
Map<String, List<List<Integer>>> map = (Map<String, List<List<Integer>>>) obj;
for (Entry<String, List<List<Integer>>> e : map.entrySet()) {
System.err.println("as name: "+ e.getKey());
for (List<Integer> chain : e.getValue()) {
System.err.println("chain : "+ chain);
}
}
这篇关于解析 GATE 文档以获取共同参考文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!