问题描述
我正在使用构建机器人。在使用Rasa NLU训练机器人时,训练数据文件,其中指定了文本,意图,实体等。例如,对于一个简单的餐厅聊天机器人,训练文件 data.json
可能包含
I am building a bot with Rasa.ai.When training the bot with Rasa NLU, we use a training data file where the text, intent, entity etc. are specified. For example for a simple restaurant chatbot, the training file data.json
may contain
{
"text": "central indian restaurant",
"intent": "restaurant_search",
"entities": [
{
"start": 0,
"end": 7,
"value": "central",
"entity": "location"
},
{
"start": 8,
"end": 14,
"value": "indian",
"entity": "cuisine"
}
]
}
我们使用它来训练模型。但是我们需要手动(或通过GUI)创建此训练文件。
We use this to train the model. But we need to create this training file manually (or through a GUI).
有没有我可以提供句子的工具,它可以自动创建意图和实体?
Is there any tool where I can feed sentences and it can automatically create intent and entity?
Sample Input: Is there any central Indian restaurant?
Sample Output: The above data.json
编辑:
为了更好地解释这个问题-假设我有大量的客户服务呼叫日志。我的理解是使用Rasa(或其他类似框架)的-一个人需要仔细阅读呼叫日志并了解过去可能发生的所有可能的意图,实体组合,并创建一个像 data.json 如上所示,然后再训练模型。这似乎是一个无法解决的问题。有没有一种方法可以从这些GB大小的调用日志中生成
data.json
文件而无需人工?我在这里丢失了什么吗?
To better explain this question - suppose I have a huge set of customer service call log. My understanding is with Rasa (or other similar framework) - a human being need to go through the call log and understand all possible intents, entity combination that happened in the past and create a file like data.json
such as above before training the model. This seems like a really unscalable problem. Is there a way to generate that data.json
file from those GB size call logs without involving a human being? Am I missing something here?
推荐答案
使用几行代码生成任意大的训练数据集的快速方法是
A fast way to generate arbitrarily big training datasets with a few rows of code is Chatito
- 您在直观的DSL中写下实体的典型句子和同义词。
- 它会为您生成所有组合,并对其进行混洗以进行更好的培训。
- 它将示例分成两个文件,一个用于训练,另一个用于供测试用。这样您就可以衡量您训练有素的语言模型的准确性。
这篇关于从完整的句子自动生成意图和实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!