scikit-learn，将特征添加到一组矢量化的文档中

本文介绍了scikit-learn，将特征添加到一组矢量化的文档中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从 scikit-learn 开始，我正在尝试将一组文档转换为可以应用聚类和分类的格式.我已经看到了有关矢量化方法的详细信息，以及用于加载文件和索引其词汇表的 tfidf 转换.

I am starting with scikit-learn and I am trying to transform a set of documents into a format on which I could apply clustering and classification. I have seen the details about the vectorization methods, and the tfidf transformations to load the files and index their vocabularies.

但是，我对每个文档都有额外的元数据，例如作者、负责的部门、主题列表等.

However, I have extra metadata for each documents, such as the authors, the division that was responsible, list of topics, etc.

如何向矢量化函数生成的每个文档向量添加特征?

How can I add features to each document vector generated by the vectorizing function?

将特征添加到一组矢量化的文档中

问题描述

推荐答案