从文本中提取名词

从文本中提取名词

本文介绍了从文本中提取名词+名词或(adj |名词)+名词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想查询是否可以在R包openNLP中提取名词+名词或(adj |名词)+名词吗?也就是说,我想使用语言过滤来提取候选名词短语.您能指导我怎么做吗?非常感谢.

I would like to query if it is possible to extract noun+noun or (adj|noun)+noun in R package openNLP?That is, I would like to use linguistic filtering to extract candidate noun phrases. Could you direct me how to do?Many thanks.

感谢您的回复.这是代码:

Thanks for the responses.here is the code:

library("openNLP")

acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in
        pipeline and terminal operations for 12.2 mln dlrs. The company said
        the sale is subject to certain post closing adjustments,
        which it did not explain. Reuter."

acqTag <- tagPOS(acq)
acqTagSplit = strsplit(acqTag," ")
acqTagSplit

qq = 0
tag = 0

for (i in 1:length(acqTagSplit[[1]])){
    qq[i] <-strsplit(acqTagSplit[[1]][i],'/')
    tag[i] = qq[i][[1]][2]
}

index = 0

k = 0

for (i in 1:(length(acqTagSplit[[1]])-1)) {

    if ((tag[i] == "NN" && tag[i+1] == "NN") |
        (tag[i] == "NNS" && tag[i+1] == "NNS") |
        (tag[i] == "NNS" && tag[i+1] == "NN") |
        (tag[i] == "NN" && tag[i+1] == "NNS") |
        (tag[i] == "JJ" && tag[i+1] == "NN") |
        (tag[i] == "JJ" && tag[i+1] == "NNS"))
    {
            k = k +1
            index[k] = i
    }

}

index


读者可以参考 acqTagSplit 上的索引进行名词+名词或(adj |名词)+名词提取.(该代码不是最佳代码,但可以使用.如果您有任何想法,请告诉我.)


Reader can refer index on acqTagSplit to do noun+noun or (adj|noun)+noun extractation.(The code is not optimum but work. If you have any idea, please let me know.)

此外,我仍然有问题.

Justeson和Katz(1995)提出了另一种语言过滤方法来提取候选名词短语:

Justeson and Katz (1995) proposed another linguistic filtering to extract candidate noun phrases:

((Adj | Noun)+ |((Adj | Noun)(Noun-Prep)?)(Adj | Noun))名词

((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun

我不太明白它的含义.您能帮我解释一下它还是将这种表示形式转换为R语言.非常感谢.

I cannot well understand its meaning. Could you do me a favor to explain it or transform such representation into R language.Many thanks.

推荐答案

有可能.

你明白了.使用POS标记器并在空格上分割:ll<-strsplit(acqTag,'').从那里迭代输入列表的长度(长度为ll),例如:for(i in 1:37){qq< strsplit(ll [[1]] [i],'/')},然后获取您要查找的语音序列部分.

You got it. Use the POS tagger and split on spaces: ll <- strsplit(acqTag,' '). From there iterate on the length of the input list (length of ll) like:for (i in 1:37){qq <-strsplit(ll[[1]][i],'/')} and get the part of speech sequence you're looking for.

在空格上分割后,它只是R中的列表处理.

After splitting on spaces it is just list processing in R.

这篇关于从文本中提取名词+名词或(adj |名词)+名词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 17:47