本文介绍了NLP 寻找实体之间的关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前的理解是,可以使用 OpenNLP、Stanford NLP 等工具包从文本文档中提取实体.

My current understanding is that it's possible to extract entities from a text document using toolkits such as OpenNLP, Stanford NLP.

但是,有没有办法找到这些实体之间的关系?

However, is there a way to find relationships between these entities?

例如考虑以下文本:

你们中的一些人可能知道,我上周在欧洲高能物理实验室 CERN 度过,去年 7 月在那里发现了著名的希格斯玻色子.每次去 CERN 时,我都深有感触除了这些年来的快速访问,我在 1990 年代后期作为访问科学家在那里呆了三个月,从事早期宇宙物理学的工作,试图弄清楚如何将我们今天看到的宇宙与可能发生的事情联系起来它还处于起步阶段."

实体:I(作者)、CERN希格斯玻色子

关系:- 我访问过"欧洲核子研究中心- CERN发现"希格斯玻色子

Relationships :- I "visited" CERN- CERN "discovered" Higgs boson

谢谢.

推荐答案

例如,您可以使用斯坦福解析器提取动词及其家属.例如,你可能会得到像

You can extract verbs with their dependants using Stanford Parser, for example. E.g., you might get "dependency chains" like

"I :: spent :: at :: CERN".

要认识到我在 CERN 度过"和我访问了 CERN"和CERN 主持了我的访问"(等)表示同一类型的事件,这是一项艰巨的任务.探讨如何做到这一点超出了 SO 问题的范围,但您可以阅读释义识别的文献(这里 是一篇概述文件).还有一个关于 SO 的相关问题.

It is a much tougher task to recognise that "I spent at CERN" and "I visited CERN" and "CERN hosted my visit" (etc) denote the same kind of event. Going into how this can be done is beyond the scope of an SO question, but you can read up literature of paraphrases recognition (here is one overview paper). There is also a related question on SO.

一旦您可以对相似的链进行聚类,您就需要找到一种方法来标记它们.您可以简单地选择集群中最常见链的动词.

Once you can cluster similar chains, you'd need to find a way to label them. You could simply choose the verb of the most common chain in a cluster.

但是,如果您有一组要提取的预定义关系类型,并且为这些关系手动注释了大量文本,则该方法可能会大不相同,例如,使用机器学习来学习如何识别基于注释数据的关系类型.

If, however, you have a pre-defined set of relation types you want to extract and lots of texts manually annotated for these relations, then the approach could be very different, e.g., using machine learning to learn how to recognize a relation type based on annotated data.

这篇关于NLP 寻找实体之间的关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-16 19:57