



我正在使用构建机器人。在使用Rasa NLU训练机器人时,训练数据文件,其中指定了文本,意图,实体等。例如,对于一个简单的餐厅聊天机器人,训练文件 data.json 可能包含

I am building a bot with Rasa.ai.When training the bot with Rasa NLU, we use a training data file where the text, intent, entity etc. are specified. For example for a simple restaurant chatbot, the training file data.json may contain

        "text": "central indian restaurant",
        "intent": "restaurant_search",
        "entities": [
            "start": 0,
            "end": 7,
            "value": "central",
            "entity": "location"
            "start": 8,
            "end": 14,
            "value": "indian",
            "entity": "cuisine"


We use this to train the model. But we need to create this training file manually (or through a GUI).


Is there any tool where I can feed sentences and it can automatically create intent and entity?

Sample Input: Is there any central Indian restaurant?
Sample Output: The above data.json


为了更好地解释这个问题-假设我有大量的客户服务呼叫日志。我的理解是使用Rasa(或其他类似框架)的-一个人需要仔细阅读呼叫日志并了解过去可能发生的所有可能的意图,实体组合,并创建一个像 data.json 如上所示,然后再训练模型。这似乎是一个无法解决的问题。有没有一种方法可以从这些GB大小的调用日志中生成 data.json 文件而无需人工?我在这里丢失了什么吗?

To better explain this question - suppose I have a huge set of customer service call log. My understanding is with Rasa (or other similar framework) - a human being need to go through the call log and understand all possible intents, entity combination that happened in the past and create a file like data.json such as above before training the model. This seems like a really unscalable problem. Is there a way to generate that data.json file from those GB size call logs without involving a human being? Am I missing something here?



A fast way to generate arbitrarily big training datasets with a few rows of code is Chatito

  1. 您在直观的DSL中写下实体的典型句子和同义词。

  2. 它会为您生成所有组合,并对其进行混洗以进行更好的培训。

  3. 它将示例分成两个文件,一个用于训练,另一个用于供测试用。这样您就可以衡量您训练有素的语言模型的准确性。


08-24 18:43