问题描述
在spacy的文本分类中train_textcat 示例,有两个标签指定了Positive 和Negative.因此猫的分数表示为
In the spacy's text classification train_textcat example, there are two labels specified Positive and Negative. Hence the cats score is represented as
cats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in label]
我正在使用多标签分类,这意味着我有两个以上的标签要在一个文本中进行标记.我已将标签添加为
I am working with Multilabel classfication which means i have more than two labels to tag in one text. I have added my labels as
textcat.add_label("CONSTRUCTION")
并指定我使用的猫分数
cats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels]
我很确定这是不正确的.任何建议如何在多标签分类中指定猫的分数以及如何训练多标签分类?spacy 的例子也适用于多标签分类吗?
I am pretty sure this is not correct. Any suggestions how to specify the scores for cats in multilabel classification and how to train multilabel classification? Does the example from spacy works for multilabel classification too?
推荐答案
如果我理解正确的话,您有一个类别列表,并且您的数据可以同时包含多个类别.在这种情况下,您不能使用 "POSITIVE": bool(y), "NEGATIVE": not bool(y)
来标记您的类.相反,尝试编写一个函数,该函数将根据类返回带有类别的字典.例如,考虑有以下类别列表:categories = ['POLITICS', 'ECONOMY', 'SPORT']
.现在,您可以迭代训练数据,为每个训练示例调用一个函数.
If I understood you correctly, you have a list of categories, and your data can have multiple categories at once. In that case you cannot use "POSITIVE": bool(y), "NEGATIVE": not bool(y)
to mark your classes. Instead, try writing a function which will return a dictionary with categories based on the classes. For example, consider having a following list of categories: categories = ['POLITICS', 'ECONOMY', 'SPORT']
. Now, you can iterate over you train data, calling a function for each training example.
这个函数看起来像这样:
This function can look like this:
def func(categories):
cats = {'POLITICS': 0, 'ECONOMY': 0, 'SPORT': 0}
for category in categories:
cats[category] = 1
return {'cats': cats}
有一个包含两个类别的训练示例(例如 POLITICS
和 ECONOMY
),您可以使用类别列表调用此函数(labels = func(['POLITICS', 'ECONOMY']
) 并且您将获得包含此示例类的完整字典
Having a training example with two categories (for example POLITICS
and ECONOMY
), you can call this function with a list of categories (labels = func(['POLITICS', 'ECONOMY']
) and you will get a full dictionary with classes for this example
这篇关于多标签分类中的 Spacy TextCat 分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!