我正在处理药品标签中的数据。文本总是使用动词短语“indicated for”来组织。

例如:

sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"

我已经使用SpaCy筛选出仅包含短语“indicated for”的句子。

我现在需要一个函数来接收句子,并返回作为“指示”对象的短语。因此,对于此示例,我称为extract()的函数将按以下方式运行:
extract(sentence)
>> 'relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis'

是否有使用spacy做到这一点的功能?

编辑:
对于“复杂的示例”,仅在“指示”之后拆分即可。

这里有一些例子:

'''丁丙诺啡和纳洛酮舌下片适用于阿片类药物依赖的维持治疗,应作为完整治疗计划的一部分,以包括咨询和社会心理支持丁丙诺啡和纳洛酮舌下片中含有丁丙诺啡,部分阿片样物质激动剂和纳洛酮阿片类药物拮抗剂,适用于阿片类药物依赖的维持治疗'''

'''氧氟沙星滴眼液适用于结膜炎革兰氏阳性菌革兰氏阴性菌金黄色葡萄球菌金黄色葡萄球菌表皮葡萄球菌肺炎链球菌肠杆菌溃疡革兰氏阳性菌革兰氏阴性菌金黄色葡萄球菌表皮葡萄球菌肺炎链球菌铜绿假单胞菌粘质沙雷氏菌

我只想要大胆的部分。

最佳答案

# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
text = 'Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.'
doc = nlp(text)
for word in doc:
    if word.dep_ in ('pobj'):
        subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
        print(subtree_span.text)

输出:
relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis

多个输出的原因是由于多个pobj。

编辑2:
# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
para = '''Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.'''
doc = nlp(para)

# To extract sentences based on key word
indicated_for_sents = [sent for sent in doc.sents if 'indicated for' in sent.string]
print indicated_for_sents
print
# To extract objects of verbs
for word in doc:
    if word.dep_ in ('pobj'):
        subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
        print(subtree_span.text)

输出:
[Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
, Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.]

relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis


the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below
infections caused by susceptible strains of the following bacteria in the conditions listed below
susceptible strains of the following bacteria in the conditions listed below
the following bacteria in the conditions listed below
the conditions listed below

检查此链接

https://github.com/NSchrading/intro-spacy-nlp/blob/master/subject_object_extraction.py

关于python - 如何获得作为某个动词宾语的名词从句?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/49542787/

10-11 01:36
查看更多