本文介绍了TfidfVectorizer dtype 不匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试在语料库上使用 TfidfVectorizer,但每次我都遇到此错误
I'm trying to use the TfidfVectorizer on a corpus but every time I end up with this error
File "sparsefuncs.pyx", line 117, in sklearn.utils.sparsefuncs.inplace_csr_row_normalize_l2 (sklearn\utils\sparsefuncs.c:2328)
ValueError: Buffer dtype mismatch, expected 'int' but got 'long long'
这是我的代码
corpus = []
testCorpus = []
trainType = []
testType = []
with open("stone_sku.csv") as f:
cr = csv.DictReader(f)
for row in cr:
corpus.append(row['sku'])
trainType.append(row['sku'])
with open("stone_sku.csv") as f:
crTest = csv.DictReader(f)
for row in crTest:
testCorpus.append(row['sku'])
testType.append(row['sku'])
cv = TfidfVectorizer(min_df=1, analyzer='char', ngram_range=(2,3))
trainCounts = cv.fit_transform(corpus)
它与 CountVectorizer 一起工作正常,如果我尝试使用 TfidfTransformer 转换数据会发生同样的错误
It works fine with CountVectorizer and the same error occurs if I try to transform the data using TfidfTransformer
推荐答案
您运行的是 64 位 Windows 吗?这可能是由最近在 master 分支中修复的已知问题引起的.
Are you running 64 bit Windows? This might be caused by a known issue that has been recently fixed in the master branch.
这篇关于TfidfVectorizer dtype 不匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!