问题描述
我使用sklearn版本0.16.1。看起来FeatureHasher不支持字符串(就像DictVectorizer一样)。
例如:
值= [
{'city':'Dubai','temperature' :}},
{'city':'London','temperature':12.},
{'city':'San Fransisco','temperature':18.}
)
print(Starting FeatureHasher ...)
hasher = FeatureHasher(n_features = 2)
X = hasher.transform(values).toarray()
print X
但收到以下错误:
_hashing.transform(raw_X,self.n_features,self.dtype)
文件_hashing.pyx,行46,位于sklearn.feature_extraction._hashing.transform (sklearn \feature_extraction\_hashing.c:1762)
TypeError:需要一个浮点数
我无法使用因为我的da taset非常大,功能高基数,所以我得到一个MemoryError。
有什么建议?
更新(2016年10月):
NirIzr评论说,现在支持,因为sklearn开发团队在
FeatureHasher应该正确处理从0.18版本开始的字符串字典值。
这是一个已知的sklearn问题:
FeatureHasher目前不支持其字典输入格式的字符串值
I'm using sklearn version 0.16.1. It seems that FeatureHasher doesn't support strings (as DictVectorizer does). For example:
values = [
{'city': 'Dubai', 'temperature': 33.},
{'city': 'London', 'temperature': 12.},
{'city': 'San Fransisco', 'temperature': 18.}
]
print("Starting FeatureHasher ...")
hasher = FeatureHasher(n_features=2)
X = hasher.transform(values).toarray()
print X
But the following error is received:
_hashing.transform(raw_X, self.n_features, self.dtype)
File "_hashing.pyx", line 46, in sklearn.feature_extraction._hashing.transform (sklearn\feature_extraction\_hashing.c:1762)
TypeError: a float is required
I can't use DictVectorizer since my dataset is very big and the features are with high cardinality so I get a MemoryError. Any suggestions?
Update (October 2016):
As NirIzr commented, this is now supported, as sklearn dev team addressed this issue in https://github.com/scikit-learn/scikit-learn/pull/6173
FeatureHasher should properly handle string dictionary values as of version 0.18.
It is a known sklearn issue: FeatureHasher does not currently support string values for its dict input format
https://github.com/scikit-learn/scikit-learn/issues/4878
这篇关于TypeError:sklearn.feature_extraction.FeatureHasher中需要float的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!