问题描述
我有黄金数据,我在几个文件中注释了所有房间号码。我想使用openNLP来训练使用这些数据的模型并对房间号进行分类。我被困在哪里开始。我阅读了openNLP maxent文档,查看了opennlp.tools中的示例,现在查看opennlp.tools.ml.maxent - 看起来它应该是我应该使用的东西,但我仍然不知道如何使用。有人可以给我一些关于如何使用openNLP maxent以及从哪里开始的基本想法?任何帮助将不胜感激。
I have gold data where I annotated all room numbers from several documents. I want to use openNLP to train a model that uses this data and classify room numbers. I am stuck on where to start. I read openNLP maxent documentation, looked at examples in opennlp.tools and now looking at opennlp.tools.ml.maxent - it seems like it is something what I should be using, but still I have no idea on how to use. Can somebody give me some basic idea on how to use openNLP maxent and where to start with? Any help will be appreciated.
推荐答案
这是一个演示OpenNLP Maxent API使用的最小工作示例。
This is a minimal working example that demonstrates the usage of OpenNLP Maxent API.
它包括以下内容:
- 根据存储在文件中的数据训练maxent模型。
- 将训练过的模型存储到文件中。
- 从文件中加载训练过的模型。
- 使用分类模型。
- 注意:结果是每个培训样本中的第一个元素
- 注意:值可以是任意字符串,例如
xyz = s0methIng
- Training a maxent model from data stored in a file.
- Storing the trained model into a file.
- Loading the trained model from a file.
- Using the model for classification.
- NOTE: the outcome is the first element in each training sample
- NOTE: the values can be arbitrary strings, e.g.
xyz=s0methIng
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.zip.GZIPInputStream;
import opennlp.maxent.GIS;
import opennlp.maxent.io.GISModelReader;
import opennlp.maxent.io.SuffixSensitiveGISModelWriter;
import opennlp.model.AbstractModel;
import opennlp.model.AbstractModelWriter;
import opennlp.model.DataIndexer;
import opennlp.model.DataReader;
import opennlp.model.FileEventStream;
import opennlp.model.MaxentModel;
import opennlp.model.OnePassDataIndexer;
import opennlp.model.PlainTextFileDataReader;
...
String trainingFileName = "training-file.txt";
String modelFileName = "trained-model.maxent.gz";
// Training a model from data stored in a file.
// The training file contains one training sample per line.
// Outcome (result) is the first element on each line.
// Example:
// result=1 a=1 b=1
// result=0 a=0 b=1
// ...
DataIndexer indexer = new OnePassDataIndexer( new FileEventStream(trainingFileName));
MaxentModel trainedMaxentModel = GIS.trainModel(100, indexer); // 100 iterations
// Storing the trained model into a file for later use (gzipped)
File outFile = new File(modelFileName);
AbstractModelWriter writer = new SuffixSensitiveGISModelWriter((AbstractModel) trainedMaxentModel, outFile);
writer.persist();
// Loading the gzipped model from a file
FileInputStream inputStream = new FileInputStream(modelFileName);
InputStream decodedInputStream = new GZIPInputStream(inputStream);
DataReader modelReader = new PlainTextFileDataReader(decodedInputStream);
MaxentModel loadedMaxentModel = new GISModelReader(modelReader).getModel();
// Now predicting the outcome using the loaded model
String[] context = {"a=1", "b=0"};
double[] outcomeProbs = loadedMaxentModel.eval(context);
String outcome = loadedMaxentModel.getBestOutcome(outcomeProbs);
这篇关于使用openNLP maxent训练模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!