使用TfidfVectorizer函数如何知道特定单词的值?
例如,我的代码是:
docs = []
docs.append("this is sentence number one")
docs.append("this is sentence number two")
vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
sklearn_representation = vectorizer.fit_transform(docs)
现在,我如何知道句子2(docs [1])中“句子”的TF-IDF值?
最佳答案
您需要使用vectorizer
的vocabulary_
属性,该属性是术语到要素索引的映射。
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> docs = []
>>> docs.append("this is sentence number one")
>>> docs.append("this is sentence number two")
>>> vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
>>> x = vectorizer.fit_transform(docs)
>>> x.todense()
matrix([[ 0.70710678, 0.70710678],
[ 0.70710678, 0.70710678]])
>>> vectorizer.vocabulary_['sentence']
1
>>> c = vectorizer.vocabulary_['sentence']
>>> x[:,c]
<2x1 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> x[:,c].todense()
matrix([[ 0.70710678],
[ 0.70710678]])
关于python - 如何知道单词的特定TF-IDF值?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43191522/