本文介绍了Scikit学习如何检查模型(例如TfidfVectorizer)是否已经适合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
要从文本中提取特征,如何检查矢量数据(例如TfIdfVectorizer或CountVectorizer)是否已经适合训练数据?
特别是,我希望代码自动确定矢量化器是否已经适合.
For feature extraction from text, how to check if a vectorizer (e.g. TfIdfVectorizer or CountVectorizer) has been already fit on a training data?
In particular, I want the code to automatically figure out if a vectorizer has been already fit.
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
def vectorize_data(texts):
# if vectorizer has not been already fit
vectorizer.fit_transform(texts)
# else
vectorizer.transform(texts)
推荐答案
您可以使用 check_is_fitted
基本上是为此目的而制作的.
You can use the check_is_fitted
which is basically made for doing this.
在 TfidfVectorizer.transform() ,您可以检查其用法:
In the source of TfidfVectorizer.transform()
you can check its usage:
def transform(self, raw_documents, copy=True):
# This is what you need.
check_is_fitted(self, '_tfidf', 'The tfidf vector is not fitted')
X = super(TfidfVectorizer, self).transform(raw_documents)
return self._tfidf.transform(X, copy=False)
因此,您可以这样做:
from sklearn.utils.validation import check_is_fitted
def vectorize_data(texts):
try:
check_is_fitted(vectorizer, '_tfidf', 'The tfidf vector is not fitted')
except NotFittedError:
vectorizer.fit(texts)
# In all cases vectorizer if fit here, so just call transform()
vectorizer.transform(texts)
这篇关于Scikit学习如何检查模型(例如TfidfVectorizer)是否已经适合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!