本文介绍了如何在 Spacy 短语匹配器中获取短语计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试 spaCy 的 PhraseMatcher.我使用了网站中给出的示例的改编版,如下所示.
I am trying spaCy's PhraseMatcher. I have used an adaptation of the example given in the website like below.
color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')]
product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')]
material_patterns = [nlp(text) for text in ('bat', 'yellow ball')]
matcher = PhraseMatcher(nlp.vocab)
matcher.add('COLOR', None, *color_patterns)
matcher.add('PRODUCT', None, *product_patterns)
matcher.add('MATERIAL', None, *material_patterns)
doc = nlp("yellow ball yellow lines")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'COLOR'
span = doc[start : end] # get the matched slice of the doc
print(rule_id, span.text)
输出为
COLOR yellow
MATERIAL ball
我的问题是如何获得短语的数量,以便我的输出看起来像指示黄色出现两次而球只出现一次.
My question is how do I get the count of phrases such that my output looks like indicating yellow occurred twice and ball only once.
COLOR Yellow (2)
MATERIAL ball (1)
推荐答案
类似的事情?
from collections import Counter
from spacy.matcher import PhraseMatcher
color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')]
product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')]
material_patterns = [nlp(text) for text in ('bat', 'yellow ball')]
matcher = PhraseMatcher(nlp.vocab)
matcher.add('COLOR', None, *color_patterns)
matcher.add('PRODUCT', None, *product_patterns)
matcher.add('MATERIAL', None, *material_patterns)
d = []
doc = nlp("yellow ball yellow lines")
matches = matcher(doc)
for match_id, start, end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID, i.e. 'COLOR'
span = doc[start : end] # get the matched slice of the doc
d.append((rule_id, span.text))
print("\n".join(f'{i[0]} {i[1]} ({j})' for i,j in Counter(d).items()))
输出:
COLOR yellow (2)
MATERIAL yellow ball (1)
这篇关于如何在 Spacy 短语匹配器中获取短语计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!