问题描述
我有以下两个带有 POS 标签的字符串:
I have the following two strings with their POS tags:
Sent1:像专业作家或用语的工作方式这样的东西真的很酷."
[('something', 'NN'), ('like', 'IN'), ('how', 'WRB'), ('writer','NN'), ('pro', 'NN'), ('or', 'CC'), ('phraseology', 'NN'), ('works','NNS'), ('would', 'MD'), ('be', 'VB'), ('really', 'RB'), ('cool','JJ'), ('.', '.')]
Sent2:语法编辑器等更多选项会很好"
[('more', 'JJR'), ('options', 'NNS'), ('like', 'IN'), ('the', 'DT'),('syntax', 'NN'), ('editor', 'NN'), ('would', 'MD'), ('be', 'VB'),('nice', 'JJ')]
我正在寻找一种方法来检测(返回True)是否存在以下序列:would"+ be"+形容词(无论形容词的位置如何,只要它在would"be"之后)在这些字符串中.在第二个字符串中,形容词nice"紧跟在would be"之后,但在第一个字符串中并非如此.
I am looking for a way to detect (return True) if there is the sequence: "would" + be" + adjective (regardless of the position of the adjective, as long as its after "would" "be") in these strings. In the second string the adjective, "nice" immediately follows "would be" but that is not the case in the first string.
琐碎的情况(形容词前没有其他词;会很好")在我之前的一个问题中得到了解决:检测 POS 标签模式和指定的单词
The trivial case (no other word before the adjective; "would be nice") was solved in an earlier question of mine: detecting POS tag pattern along with specified words
我现在正在寻找一个更通用的解决方案,其中可选词可能出现在形容词之前.我是 NLTK 和 Python 的新手.
I am now looking for a more general solution where optional words may occur before the adjective. I am new to NLTK and Python.
推荐答案
首先按照说明安装nltk_cli
:https://github.com/alvations/nltk_cli
那么,这里有一个nltk_cli
的秘密函数,也许你会发现它很有用:
Then, here's a secret function in nltk_cli
, maybe you'll find it useful:
alvas@ubi:~/git/nltk_cli$ cat infile.txt
something like how writer pro or phraseology works would be really cool .
more options like the syntax editor would be nice
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+ADJP infile.txt
would be really cool
would be nice
举例说明其他可能的用法:
To illustrate other possible usage:
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+VP infile.txt
!!! NO CHUNK of VP+VP in this sentence !!!
!!! NO CHUNK of VP+VP in this sentence !!!
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 NP+VP infile.txt
how writer pro or phraseology works would be
the syntax editor would be
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+NP infile.txt
!!! NO CHUNK of VP+NP in this sentence !!!
!!! NO CHUNK of VP+NP in this sentence !!!
然后如果你想检查句子中的短语是否输出真/假,只需读取并迭代nltk_cli
的输出并检查if-else
条件.
Then if you want to check if the phrase in sentence and output True/False, simply read and iterate through the outputs from nltk_cli
and check with if-else
conditions.
这篇关于匹配词性标签和单词序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!