本文介绍了NLP查找实体之间的关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前的理解是,可以使用诸如OpenNLP,Stanford NLP之类的工具包从文本文档中提取实体.

My current understanding is that it's possible to extract entities from a text document using toolkits such as OpenNLP, Stanford NLP.

但是,是否可以找到这些实体之间的关系?

However, is there a way to find relationships between these entities?

例如,考虑以下文本:

正如你们中的一些人所知,我上周在欧洲高能物理实验室CERN呆了,去年7月发现了著名的希格斯玻色子.每次去CERN时,我都会有一种深刻的感觉除了这些年来的快速访问之外,我在1990年代后期曾以访问科学家的身份在这里呆了三个月,从事早期宇宙物理学的工作,试图弄清楚如何将我们今天看到的宇宙与可能发生的事情联系起来.还处于起步阶段."

实体:(作者), CERN 希格斯玻色子

关系:-我"访问了" CERN-欧洲核子研究组织(CERN)"发现"希格斯玻色子

Relationships :- I "visited" CERN- CERN "discovered" Higgs boson

谢谢.

推荐答案

例如,您可以使用Stanford Parser提取动词及其从属词.例如,您可能会获得

You can extract verbs with their dependants using Stanford Parser, for example. E.g., you might get "dependency chains" like

"I :: spent :: at :: CERN".

要认识到我在欧洲核子研究中心度过",我曾访问欧洲核子研究中心"和欧洲核子研究中心主持了我的访问"(等)表示同一事件,是一项艰巨的任务.进行此操作的方法超出了SO问题的范围,但是您可以阅读有关释义识别的文献(此处是一份概述文件).关于SO,还有一个相关问题.

It is a much tougher task to recognise that "I spent at CERN" and "I visited CERN" and "CERN hosted my visit" (etc) denote the same kind of event. Going into how this can be done is beyond the scope of an SO question, but you can read up literature of paraphrases recognition (here is one overview paper). There is also a related question on SO.

一旦您可以将相似的链聚类,就需要找到一种方法来标记它们.您只需选择集群中最常见的链的动词即可.

Once you can cluster similar chains, you'd need to find a way to label them. You could simply choose the verb of the most common chain in a cluster.

但是,如果您要提取一组预定义的关系类型,并为这些关系手动注释了许多文本,则方法可能会大不相同,例如,使用机器学习来学习如何识别基于注释数据的关系类型.

If, however, you have a pre-defined set of relation types you want to extract and lots of texts manually annotated for these relations, then the approach could be very different, e.g., using machine learning to learn how to recognize a relation type based on annotated data.

这篇关于NLP查找实体之间的关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 03:10