问题描述
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
vectorizer = vectorizer.fit(word_data)
freq_term_mat = vectorizer.transform(word_data)
from sklearn.feature_extraction.text import TfidfTransformer
tfidf = TfidfTransformer(norm="l2")
tfidf = tfidf.fit(freq_term_mat)
Ttf_idf_matrix = tfidf.transform(freq_term_mat)
voc_words = Ttf_idf_matrix.getfeature_names()
print "The num of words = ",len(voc_words)
当我运行包含这段代码的程序时,出现以下错误:
when I run the program containing this piece of code I get following error:
回溯(最近一次调用最后一次):文件vectorize_text.py",第 87 行,在
voc_words = Ttf_idf_matrix.getfeature_names()
getattr
中的文件/home/farheen/anaconda/lib/python2.7/site->packages/scipy/sparse/base.py",第 499 行引发 AttributeError(attr + " not found")
AttributeError: get_feature_names 未找到
请给我建议一个解决方案.
Please suggest me a solution for it.
推荐答案
我发现您的代码有两个问题.首先,您将 get_feature_names() 应用于矩阵输出,而不是矢量化器.您需要将其应用于矢量化器.其次,您不必要地将其分解为太多步骤.您可以使用 TfidfVectorizer.fit_transform() 在更少的空间内做您想做的事.试试这个:
I see two problems with your code. First, you are applying get_feature_names() to your matrix output, rather than to the vectorizer. You need to apply it to the vectorizer. Second, you are unnecessarily breaking this apart into too many steps. You can use TfidfVectorizer.fit_transform() to do what you want in much less space. Try this:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
transformed = vectorizer.fit_transform(word_data)
print "Num words:", len(vectorizer.get_feature_names())
这篇关于AttributeError: getfeature_names 未找到;使用 scikit-learn的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!