问题描述
我正在尝试使用私人劳动力来执行简单的GroundTruth标签作业,以进行文本分类.由于我是AWS GroundTruth的新手,所以我有一些问题:
I'm trying to run a simple GroundTruth labeling job with a private workforce for text classification. Since I'm new to AWS GroundTruth, I have some questions:
-
如果我使用私人劳动力,我可以分配给标签工作的最大人数是多少?定价成本是否取决于私人劳动力中使用的人数.
If I use private workforce what is the maximum number of persons that I can allocate to the labeling job? Does the pricing cost depend on number of persons used in private workforce.
我有一个标记的数据集(文本分类),并将其上传到S3存储桶,如果我向其上传了另一个未标记的数据,AutoML是否会标记提供的原始数据?如果没有,如何使用已标记的数据集标记新的原始数据/
I have a labeled dataset (text classication), and I upload it to S3 bucket, if I upload another unlabeled datas to it, will AutoML label the provided raw data? If not, how can I use already labelled dataset to label new raw datas/
Groundtruth文档说,它至少需要1000个对象才能被人类标记.它是指所有类别的1000个对象,还是单个类别的1000个对象?如果我手动标记1000个以上的对象,则AutoML可以标记多少个对象,或者AutoML可以标记的最大对象数是什么?
Groundtruth documentation says that it needs atleast 1000 objects to be labeled by humans. Does it mean 1000 objects of all classes, or 1000 objects for individual class? If I manually label 1000+ objects, how many more objects will AutoML label or what is the maximum number of objects can AutoML label?
推荐答案
我是Amazon SageMaker Ground Truth的产品经理,很高兴为您解答.这是我的回复:
I'm the product manager for Amazon SageMaker Ground Truth, and I would be happy to answer your query. Here are my responses:
[1]您的私人标签工作人员可以随心所欲地变大或变小.定价不取决于标签工作人员的人数.
[1] Your private labeling workforce can be as large or small as you would like it to be. The pricing does not depend on this size of your labeling workforce.
[2]您将在此处了解有关如何带部分"标签数据集的更多信息: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-reusing-data.html#sms-reusing-data-newdata
[2] You learn more about how to bring a "partially" labeled dataset here: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-reusing-data.html#sms-reusing-data-newdata
您还可以使用从先前的贴标工作中训练出来的ML模型.在这里了解更多; https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-ground-truth-using-a-pre-trained-model-for-faster-data-labeling/
You can also use the ML model trained from a previous labeling job. Learn more here; https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-ground-truth-using-a-pre-trained-model-for-faster-data-labeling/
[3]为了明确起见,您需要1,000个数据集对象才能开始自动标记作业,但是可以对这1,000个对象中的某些对象进行自动标记(%取决于您的数据和用例).您的课程中总共有1,000个对象-也就是说,除了拥有1,000个文本数据集对象外,没有其他要求.
[3] To clarify, you need 1,000 dataset objects to start an auto-labeling job, but some of these 1,000 objects can be auto-labeled (the % depends on your data and use case). It is 1,000 objects across your classes - i.e. there is no additional requirement beyond having 1,000 text dataset objects.
您可以从此博客文章中了解有关自动标记机制的更多信息:"> https://aws.amazon.com/blogs/machine-learning/annotate-data-for-less-with-亚马逊智者地面真相和自动数据标签/
You can learn more about the mechanics of auto-labeling from this blog post: https://aws.amazon.com/blogs/machine-learning/annotate-data-for-less-with-amazon-sagemaker-ground-truth-and-automated-data-labeling/
这篇关于有关Amazon Sagemaker groundtruth的信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!