python - 如何知道单词的特定TF-IDF值？

使用TfidfVectorizer函数如何知道特定单词的值？
例如，我的代码是：

docs = []
docs.append("this is sentence number one")
docs.append("this is sentence number two")
vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
sklearn_representation = vectorizer.fit_transform(docs)

现在，我如何知道句子2（docs [1]）中“句子”的TF-IDF值？

最佳答案

您需要使用vectorizer的vocabulary_属性，该属性是术语到要素索引的映射。

>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> docs = []
>>> docs.append("this is sentence number one")
>>> docs.append("this is sentence number two")
>>> vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
>>> x = vectorizer.fit_transform(docs)
>>> x.todense()
matrix([[ 0.70710678,  0.70710678],
        [ 0.70710678,  0.70710678]])
>>> vectorizer.vocabulary_['sentence']
1
>>> c = vectorizer.vocabulary_['sentence']
>>> x[:,c]
<2x1 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>
>>> x[:,c].todense()
matrix([[ 0.70710678],
        [ 0.70710678]])

关于python - 如何知道单词的特定TF-IDF值？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/43191522/