问题描述
我正在使用规则扩展 spaCy 模型.在查看文档时,我注意到 IN
属性,用于将模式映射到属性字典.这很好,但它仅适用于单个令牌.
I am extending a spaCy model using rules. While looking through the documentation, I noticed the IN
attribute, which is used to map patterns to a dictionary of properties. This is great however it only works on single tokens.
例如这个模式: {"label":"EXAMPLE","pattern":[{"LOWER": {"IN": ["such as", "like", "for example"]}}]}
仅适用于术语 like
而不是其他.
For example, this pattern: {"label":"EXAMPLE","pattern":[{"LOWER": {"IN": ["such as", "like", "for example"]}}]}
will only work with the term like
but not the others.
对于多术语属性实现相同结果的最佳方法是什么?
What is the best way to achieve the same result for multi-terms attributes?
推荐答案
这取决于预期模式的复杂程度,但是 PhraseMatcher
可以使用属性 LOWER 处理与上述类似的情况:
It depends on how complicated the intended patterns are, but the
PhraseMatcher
can handle similar cases as above using the attribute LOWER
:
import spacy
from spacy.matcher import PhraseMatcher
nlp = spacy.blank("en")
pmatcher = PhraseMatcher(nlp.vocab, attr="LOWER")
phrases = ["such as", "like", "for example"]
pmatcher.add("EXAMPLE", [nlp(x) for x in phrases])
assert pmatcher(nlp("Things Such As Books")) == [(15373972490796046842, 1, 3)]
这篇关于在 IN 属性中具有多项条目的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!