本文介绍了从SpaCy删除范围内的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在用Spacy解析一个句子,如下所示:
I am parsing a sentence with Spacy like following:
import spacy
nlp = spacy.load("en")
span = nlp("This is some text.")
我想知道是否有一种方法可以删除跨度中的单词,同时仍然保持其余单词的格式像句子一样.如
I am wondering if there is a way to delete a word in the span, while still keep the remaining words format like a sentence. Such as
del span[3]
这可能会产生类似
如果其他一些没有SpaCy的方法也能达到同样的效果,那也将是非常棒的.
If some other methods without SpaCy could achieve the same effect that will be great too.
推荐答案
有一种解决方法.
这个想法是,您从文档创建一个numpy数组,删除不需要的条目,然后从新的numpy数组创建一个文档.
The idea is that you create a numpy array from the doc, you delete the entry you don't want and then you create a doc from the new numpy array.
import spacy
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
from spacy.tokens import Doc
import numpy
def remove_span(doc, index):
np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
np_array_2 = numpy.delete(np_array, (index), axis = 0)
doc2 = Doc(doc.vocab, words=[t.text for i, t in enumerate(doc) if i!=index])
doc2.from_array([LOWER, POS, ENT_TYPE, IS_ALPHA], np_array_2)
return doc2
# load english model
nlp = spacy.load('en')
doc = nlp("This is some text")
new_doc = remove_span(doc, 3)
print(new_doc)
希望有帮助!
这篇关于从SpaCy删除范围内的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!