在scikit-learn中了解DictVectorizer吗?

本文介绍了在scikit-learn中了解DictVectorizer吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在探索scikit-learn提供的不同特征提取类.阅读文档我不太了解DictVectorizer可以用于什么?其他问题浮现在脑海.例如，如何将DictVectorizer用于文本分类?即该类如何帮助处理带标签的文本数据?除了示例，任何人都可以提供一个简短的示例我已经在文档网页上阅读过的内容?

I'm exploring the different feature extraction classes that scikit-learn provides. Reading the documentation I did not understand very well what DictVectorizer can be used for? Other questions come to mind. For example, how can DictVectorizer be used for text classification?, i.e. how does this class help handle labelled textual data? Could anybody provide a short example apart from the example that I already read at the documentation web page?

推荐答案

假设您的特征空间为 length ， width 和 height 进行了3次观察；即您测量长度，宽度和3个物件的高度:

say your feature space is length, width and height and you have had 3 observations; i.e. you measure length, width & height of 3 objects:

       length  width  height
obs.1       1      0       2
obs.2       0      1       1
obs.3       3      2       1

另一种显示方式是使用词典列表:

another way to show this is to use a list of dictionaries:

[{'height': 1, 'length': 0, 'width': 1},   # obs.2
 {'height': 2, 'length': 1, 'width': 0},   # obs.1
 {'height': 1, 'length': 3, 'width': 2}]   # obs.3

DictVectorizer相反.即给定词典列表即可构建顶部框架:

DictVectorizer goes the other way around; i.e given the list of dictionaries builds the top frame:

>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer(sparse=False)
>>> d = [{'height': 1, 'length': 0, 'width': 1},
...      {'height': 2, 'length': 1, 'width': 0},
...      {'height': 1, 'length': 3, 'width': 2}]
>>> v.fit_transform(d)
array([[ 1.,  0.,  1.],   # obs.2
       [ 2.,  1.,  0.],   # obs.1
       [ 1.,  3.,  2.]])  # obs.3
   # height, len., width

这篇关于在scikit-learn中了解DictVectorizer吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！