本文介绍了如何从Spacy NER模型获得每个实体的预测概率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我将此官方示例代码用于使用我自己的训练样本从头开始训练NER模型。 当我预测在新文本上使用此模型时,我想获得每个实体的预测概率。 #测试保存的模型 print( Loading from,output_dir) nlp2 = spacy.load(output_dir)用于文本,TRAIN_DATA中的_: doc = nlp2(text) print( Entities,[(ent.text,ent.label_)用于doc.ents中的ent]) print(令牌,[(文档中t的((t.text,t.ent_type_,t.ent_iob)]]) 我无法在Spacy中找到方法来预测每个实体的概率。 怎么办我从Spacy得到这种可能性了吗?我需要它对它应用限制。解决方案从Spacy NER模型获得每个实体的预测概率并非易事。 这是从此处: 来自集合的导入空间 import defaultdict texts = ['John在Microsoft工作。'] #要考虑的替代分析数。越多越慢,但不一定越好-您需要对问题进行试验。 beam_width = 16 #这会在每个步骤中剪辑解决方案。我们将排名最高的操作的得分乘以该值,并将结果用作阈值。这样可以防止解析器探索看起来不太可能的选项,从而节省了一些效率。由于我们对贪婪的目标进行了训练,因此准确性也可能会提高。 beam_density = 0.0001 nlp = spacy.load('en_core_web_md') docs = list(nlp.pipe(texts,disable = ['ner']] )) beams = nlp.entity.beam_parse(docs,beam_width = beam_width,beam_density = beam_density) for doc,zip in zip(docs,beams):entity_scores = defaultdict(float)用于得分,nlp.entity.moves.get_beam_parses(beam)中的条目:用于开始,结束,标签中的ents:entity_scores [(开始,结束,标签) ] + =得分 l = [] for k,v在entity_scores.items()中: l.append({'start':k [0],'end ':k [1],'label':k [2],'prob':v}) for a in sorted(l,key = lambda x:x ['start']) : print(a) ###输出:#### {'开始':0,'结束':1,'标签' :'PERSON','prob':0.4054479906820232} {'start':0,'end':1,'label':'ORG','prob':0.01002015005487447} {'start' :0,'end':1,'label':'PRODUCT','prob':0.0008592912552754791} {'start':0,'end':1,'label':'WORK_OF _ART','prob':0.0007666755792166002} {'start':0,'end':1,'label':'NORP','prob':0.00034931990870877333} {'start':0 ,'end':1,'label':'TIME','prob':0.0002786051849320804} {'start':3,'end':4,'label':'ORG','prob': 0.9990115861687987} {'开始':3,'结束':4,'标签':'PRODUCT','问题':0.0003378157477046507} {'开始':3,'结束':4, 'label':'FAC','prob':8.249734411749544e-05} I used this official example code to train a NER model from scratch using my own training samples.When I predict using this model on new text, I want to get the probability of prediction of each entity. # test the saved model print("Loading from", output_dir) nlp2 = spacy.load(output_dir) for text, _ in TRAIN_DATA: doc = nlp2(text) print("Entities", [(ent.text, ent.label_) for ent in doc.ents]) print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])I am unable to find a method in Spacy to get the probability of prediction of each entity.How do I get this probability from Spacy? I need it to apply a cutoff on it. 解决方案 Getting the probabilities of prediction per entity from a Spacy NER model is not trivial.Here is the solution adapted from here :import spacyfrom collections import defaultdicttexts = ['John works at Microsoft.']# Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem.beam_width = 16# This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective.beam_density = 0.0001nlp = spacy.load('en_core_web_md')docs = list(nlp.pipe(texts, disable=['ner']))beams = nlp.entity.beam_parse(docs, beam_width=beam_width, beam_density=beam_density)for doc, beam in zip(docs, beams): entity_scores = defaultdict(float) for score, ents in nlp.entity.moves.get_beam_parses(beam): for start, end, label in ents: entity_scores[(start, end, label)] += scorel= []for k, v in entity_scores.items(): l.append({'start': k[0], 'end': k[1], 'label': k[2], 'prob' : v} )for a in sorted(l, key= lambda x: x['start']): print(a)### Output: ####{'start': 0, 'end': 1, 'label': 'PERSON', 'prob': 0.4054479906820232}{'start': 0, 'end': 1, 'label': 'ORG', 'prob': 0.01002015005487447}{'start': 0, 'end': 1, 'label': 'PRODUCT', 'prob': 0.0008592912552754791}{'start': 0, 'end': 1, 'label': 'WORK_OF_ART', 'prob': 0.0007666755792166002}{'start': 0, 'end': 1, 'label': 'NORP', 'prob': 0.00034931990870877333}{'start': 0, 'end': 1, 'label': 'TIME', 'prob': 0.0002786051849320804}{'start': 3, 'end': 4, 'label': 'ORG', 'prob': 0.9990115861687987}{'start': 3, 'end': 4, 'label': 'PRODUCT', 'prob': 0.0003378157477046507}{'start': 3, 'end': 4, 'label': 'FAC', 'prob': 8.249734411749544e-05} 这篇关于如何从Spacy NER模型获得每个实体的预测概率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
07-01 09:27