问题描述
我正在探索scikit-learn
提供的不同特征提取类.阅读文档我不太了解DictVectorizer
可以用于什么?其他问题浮现在脑海.例如,如何将DictVectorizer
用于文本分类?即该类如何帮助处理带标签的文本数据?除了示例,任何人都可以提供一个简短的示例我已经在文档网页上阅读过的内容?
I'm exploring the different feature extraction classes that scikit-learn
provides. Reading the documentation I did not understand very well what DictVectorizer
can be used for? Other questions come to mind. For example, how can DictVectorizer
be used for text classification?, i.e. how does this class help handle labelled textual data? Could anybody provide a short example apart from the example that I already read at the documentation web page?
推荐答案
假设您的特征空间为 length , width 和 height 进行了3次观察;即您测量长度,宽度和3个物件的高度:
say your feature space is length, width and height and you have had 3 observations; i.e. you measure length, width & height of 3 objects:
length width height
obs.1 1 0 2
obs.2 0 1 1
obs.3 3 2 1
另一种显示方式是使用词典列表:
another way to show this is to use a list of dictionaries:
[{'height': 1, 'length': 0, 'width': 1}, # obs.2
{'height': 2, 'length': 1, 'width': 0}, # obs.1
{'height': 1, 'length': 3, 'width': 2}] # obs.3
DictVectorizer
相反.即给定词典列表即可构建顶部框架:
DictVectorizer
goes the other way around; i.e given the list of dictionaries builds the top frame:
>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer(sparse=False)
>>> d = [{'height': 1, 'length': 0, 'width': 1},
... {'height': 2, 'length': 1, 'width': 0},
... {'height': 1, 'length': 3, 'width': 2}]
>>> v.fit_transform(d)
array([[ 1., 0., 1.], # obs.2
[ 2., 1., 0.], # obs.1
[ 1., 3., 2.]]) # obs.3
# height, len., width
这篇关于在scikit-learn中了解DictVectorizer吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!