问题描述
我发现python中的NLKT是通过* raw_parse *函数实现的,但是我需要使用Java.我发现cleartk具有MaltParser包装器,但是没有有关它的文档.我正在寻找一个函数或项目,该函数或项目首先将原始英语文本转换为MaltParser可以使用并用MaltParser解析的conll文件.任何帮助表示赞赏.
I found that NLKT in python does it via *raw_parse* function but I need to use Java. I found cleartk has a MaltParser wrapper but there is no documentation about it. I'm looking for a function or a project that first converts raw English text to conll file that MaltParser can use and parses it with MaltParser. Any help is appreciated.
推荐答案
在文件夹 examples/apiexamples/srcex 中,MaltParser 1.7.2发行版附带了一些示例.
There are examples coming with the MaltParser 1.7.2 distribution in the folder examples/apiexamples/srcex.
但是,这些示例仅显示在已执行标记化和pos标记之后(以及这些步骤的输出已转换为类似CONLL的格式之后)如何以编程方式运行MaltParser.
However, these examples only show how to run the MaltParser programmatically after tokenization and pos-tagging have already been performed (and after the output of these steps has been converted to a CONLL-like format).
由于我目前无法提供更好(更简单/更简短)的替代方法,至少我可以与您分享一个 Groovy脚本,该脚本执行标记化,词性标记(使用OpenNLP)和依赖项解析(使用MaltParser).这些工具可以使用UIMA进行互操作.如果熟悉Maven,则应该很直接地派生该脚本的Java版本.
Since I currently cannot offer a better (simpler/shorter) alternative, at least I could share with you a link to a Groovy script which performs tokenization, part-of-speech tagging (using OpenNLP) and dependency parsing (using MaltParser). The tools are made interoperable using UIMA. If one is familiar with Maven, it should be quite straight forward to derive a Java version of that script.
请记住,这不是最好的答案,但在这一点上可能总比没有好.
Mind, this is not the best answer, but at this point possibly better than nothing.
注意:我是Apache UIMA和DKPro Core(链接指向的项目)的开发人员.
Note: I'm a developer on both, Apache UIMA and DKPro Core (the project to which the link points).
这篇关于在Java中使用MaltParser解析原始文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!