使用Syntaxnet命名实体识别 | 使用Syntaxnet命名实体识别

本文介绍了使用Syntaxnet命名实体识别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试了解和学习SyntaxNet.我试图弄清楚是否有任何方法可以使用SyntaxNet进行语料库的名称实体识别.任何示例代码或有用的链接将不胜感激.

I am trying to understand and learn SyntaxNet. I am trying to figure out whether is there any way to use SyntaxNet for Name Entity Recognition of a corpus. Any sample code or helpful links would be appreciated.

推荐答案

虽然Syntaxnet并未明确提供任何命名实体识别功能，但Parsey McParseface会进行语音标记，并以Co-NLL表形式产生输出.

While Syntaxnet does not explicitly offer any Named Entity Recognition functionality, Parsey McParseface does part of speech tagging and produces the output as a Co-NLL table.

任何专有名词都被标记为NNP，我发现有一个简单的正则表达式标识符，例如:<NNP>+，即一个或多个专有名词放在一起，可以很好地产生文档中的命名实体.当然，它是基本的和基于规则的，但是仍然有效.

Any proper noun is tagged as NNP and I have found that a simple regex identifier like so: <NNP>+ i.e. one or more proper nouns put together, gives a fairly good yield of named entities within a document. It is of course rudimentary and rule-based but effective nonetheless.

为了将Co-NLL数据从demo.sh脚本(位于"/opt/tensorflow/models/syntaxnet/syntaxnet"中)传送到输出文件，请注释掉将其传送到conll2ascii的代码部分.py，以便脚本看起来像这样:

In order to pipe the Co-NLL data to an output file from the demo.sh script (located in "/opt/tensorflow/models/syntaxnet/syntaxnet") comment out the section of the code that pipes it to conll2ascii.py so that the script looks like so:

PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
MODEL_DIR=syntaxnet/models/parsey_mcparseface
[[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin

$PARSER_EVAL \
  --input=$INPUT_FORMAT \
  --output=stdout-conll \
  --hidden_layer_sizes=64 \
  --arg_prefix=brain_tagger \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/tagger-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr \
   | \
  $PARSER_EVAL \
  --input=stdin-conll \
  --output=sample-param \
  --hidden_layer_sizes=512,512 \
  --arg_prefix=brain_parser \
  --graph_builder=structured \
  --task_context=$MODEL_DIR/context.pbtxt \
  --model_path=$MODEL_DIR/parser-params \
  --slim_model \
  --batch_size=1024 \
  --alsologtostderr

您还将注意到，在上述文件中，输出参数已更改为sample-param.现在，我们将对此进行设置.转到context.pbtxt文件(位于"/opt/tensorflow/models/syntaxnet/syntaxnet/models/parsey_mcparseface"中)，并创建一个输入参数以指向您的输出文件.它应该看起来像这样:

You will also notice that the output parameter was changed in the above file to sample-param. We will now set this. Make your way to the context.pbtxt file (located in "/opt/tensorflow/models/syntaxnet/syntaxnet/models/parsey_mcparseface") and create an input parameter to point to your output file. It should look something like so:

input {
  name: 'sample-param'
  record_format: 'conll-sentence'
  Part {
    file_pattern: "directory/prepoutput.txt"
  }
}

保存并关闭文件，然后返回到"/opt/tensorflow/models/syntaxnet"并运行syntaxnet教程中给出的syntaxnet/demo.sh.完成后，转到指定的输出文件夹，您应该有一个co-nll格式的表.然后，您可以运行一个简单的迭代程序，遍历每个条目并标识pos标签，并以此为基础，尝试使用我建议的实体识别格式的变体.

Save and close the file and return to "/opt/tensorflow/models/syntaxnet" and run syntaxnet/demo.sh as given in the syntaxnet tutorial. On completion go to the specified output folder and you should have a table in co-nll format. You can then run a simple iterative program that goes over each entry and identifies the pos tags and based on this can try variations of my suggested format for entity recognition.

希望这对您有帮助！

这篇关于使用Syntaxnet命名实体识别的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！