本文介绍了如何解决 RASA NLU 中未对齐的实体注释错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 LUIS 模式模型导入 RASA 并尝试使用 spacy + scikit 管道对其进行训练.我正在使用 RASA NLU v0.10.4

I am trying to import a LUIS schema model into RASA and trying to train it using the spacy + scikit pipeline. I am using RASA NLU v0.10.4

但是当我尝试加载 LUIS 模型架构时,ner_crf 组件会抛出一个未对齐的实体注释警告.

But when I try to load the LUIS model schema the ner_crf component is throwing a Misaligned Entity Annotation warning.

虽然我在 LUIS 模型架构中正确标记了实体.

Although I have tagged the entities correctly in the LUIS model schema.

这是我的配置文件:

{
    "project": "SynonymsExample",
    "path": "C:\\Users\\xyz\\Desktop\\RASA\\models",
    "response_log": "C:\\Users\\xyz\\Desktop\\RASA\\logs",
    "pipeline": "spacy_sklearn",
    "data": "C:\\Users\\xyz\\Desktop\\RASA\\data\\examples\\RasaFormat.json",
    "cors_origins": ["*"],
    "aws_endpoint_url": null,
    "token": null,
    "num_threads": 2,
    "port": 5000
}

这是我的 LUIS 模型

Here is my LUIS model

{
  "luis_schema_version": "2.1.0",
  "versionId": "0.1",
  "name": "phraseListDemo",
  "desc": "",
  "culture": "en-us",
  "intents": [
    {
      "name": "None"
    },
    {
      "name": "PersonalInfo"
    }
  ],
  "entities": [
    {
      "name": "city"
    },
    {
      "name": "Contact"
    },
    {
      "name": "Email"
    },
    {
      "name": "FirstName"
    },
    {
      "name": "LastName"
    }
  ],
  "composites": [],
  "closedLists": [],
  "bing_entities": [
    "datetimeV2"
  ],
  "actions": [],
  "model_features": [
    {
      "name": "city",
      "mode": true,
      "words": "jaipur,bangalore,florida,japan,delhi,pune,bombay,mumbai,chennai,hyderabad,kolkata,chandigarh,ahmedabad,china,lucknow,germany,noida,indore,nagpur,coimbatore,bhopal,banglore,india,patna,maharashtra,surat,kanpur,guwahati,ludhiana,gwalior,aurangabad,amritsar,rajkot,gujarat,madurai,pradesh,dehradun,raipur,ranchi,varanasi,jabalpur,jodhpur,srinagar,mangalore,udaipur,jamshedpur,vadodara",
      "activated": true
    },
    {
      "name": "contact",
      "mode": true,
      "words": "8947847422,8967564556,8967907890,1235712345,8989898989,1231231231",
      "activated": true
    },
    {
      "name": "Email",
      "mode": true,
      "words": "xyz@email.com, abc@gmail.com",
      "activated": true
    },
    {
      "name": "emailid",
      "mode": true,
      "words": "xyz@email.com, abc@gmail.com",
      "activated": true
    },
    {
      "name": "FirstName",
      "mode": true,
      "words": "amit,ankur,ankit,ram,shyam,kunal,saikat,sundar,krishna,vikram,mohan,vijay,karthik,sunil,vivek,gopal,John,Chris,satish,surya,ajay,raju,suresh,sanjay,rajesh,ravi,ramesh,arun,rakesh,manoj,anil,kiran,sachin,dinesh,pradeep,raj,ashok,priya,prakash,david,mukesh,praveen,mahesh,naresh,anand,kumar,nikhil,michael,paul,naveen,nitin,srinivas,prasad,vinod,kishore,james,vinay,thomas",
      "activated": true
    },
    {
      "name": "LastName",
      "mode": true,
      "words": "Gupta,Sharma,Jain,kumar,singh,mishra,Mukherjee,goswami,verma,yadav,patel,ghosh,das",
      "activated": true
    },
    {
      "name": "MID",
      "mode": true,
      "words": "M1039205,M1039222,M1036767,M1048967,M1056789,M1028967,M1088967",
      "activated": true
    }
  ],
  "regex_features": [],
  "utterances": [
    {
      "text": "my name is ankur",
      "intent": "PersonalInfo",
      "entities": [
        {
          "entity": "FirstName",
          "startPos": 11,
          "endPos": 15
        }
      ]
    },
    {
      "text": "my contact number is 1231234123",
      "intent": "PersonalInfo",
      "entities": [
        {
          "entity": "Contact",
          "startPos": 21,
          "endPos": 30
        }
      ]
    },
    {
      "text": "my firstname is amit and lastname is gupta",
      "intent": "PersonalInfo",
      "entities": [
        {
          "entity": "FirstName",
          "startPos": 16,
          "endPos": 19
        },
        {
          "entity": "LastName",
          "startPos": 37,
          "endPos": 41
        }
      ]
    },
    {
      "text": "my email is a@gmail.com",
      "intent": "PersonalInfo",
      "entities": [
        {
          "entity": "Email",
          "startPos": 12,
          "endPos": 22
        }
      ]
    },
    {
      "text": "kunal is one person",
      "intent": "PersonalInfo",
      "entities": [
        {
          "entity": "FirstName",
          "startPos": 0,
          "endPos": 4
        }
      ]
    },
    {
      "text": "myself singh and my dob comes on 24 may",
      "intent": "PersonalInfo",
      "entities": [
        {
          "entity": "LastName",
          "startPos": 7,
          "endPos": 11
        }
      ]
    },
    {
      "text": "my name is gupta and my dob is in month april",
      "intent": "PersonalInfo",
      "entities": [
        {
          "entity": "LastName",
          "startPos": 11,
          "endPos": 15
        }
      ]
    },
    {
      "text": "my name is amit and my date of birth is in month of march",
      "intent": "PersonalInfo",
      "entities": [
        {
          "entity": "FirstName",
          "startPos": 11,
          "endPos": 14
        }
      ]
    }
  ]
}

谁能指出我哪里出错了?

Can anyone point where I am going wrong?

更新这是我的 RASA 格式训练数据

UpdateHere is my RASA format training data

{
  "rasa_nlu_data": {
    "entity_synonyms": [
      {
        "value": "city",
        "synonyms": [
          "jaipur",
          "bangalore",
          "florida",
          "japan",
          "delhi",
          "pune",
          "bombay",
          "mumbai",
          "chennai",
          "hyderabad",
          "kolkata",
          "chandigarh",
          "ahmedabad",
          "china",
          "lucknow",
          "germany",
          "noida",
          "indore",
          "nagpur",
          "coimbatore",
          "bhopal",
          "banglore",
          "india",
          "patna",
          "maharashtra",
          "surat",
          "kanpur",
          "guwahati",
          "ludhiana",
          "gwalior",
          "aurangabad",
          "amritsar",
          "rajkot",
          "gujarat",
          "madurai",
          "pradesh",
          "dehradun",
          "raipur",
          "ranchi",
          "varanasi",
          "jabalpur",
          "jodhpur",
          "srinagar",
          "mangalore",
          "udaipur",
          "jamshedpur",
          "vadodara"
        ]
      },
      {
        "value": "contact",
        "synonyms": [
          "8947847422",
          "8967564556",
          "8967907890",
          "1235712345",
          "8989898989",
          "1231231231"
        ]
      },
      {
        "value": "Email",
        "synonyms": [
          "xyz@email.com",
          " abc@gmail.com"
        ]
      },
      {
        "value": "emailid",
        "synonyms": [
          "xyz@email.com",
          " abc@gmail.com"
        ]
      },
      {
        "value": "FirstName",
        "synonyms": [
          "amit",
          "ankur",
          "ankit",
          "ram",
          "shyam",
          "kunal",
          "saikat",
          "sundar",
          "krishna",
          "vikram",
          "mohan",
          "vijay",
          "karthik",
          "sunil",
          "vivek",
          "gopal",
          "John",
          "Chris",
          "satish",
          "surya",
          "ajay",
          "raju",
          "suresh",
          "sanjay",
          "rajesh",
          "ravi",
          "ramesh",
          "arun",
          "rakesh",
          "manoj",
          "anil",
          "kiran",
          "sachin",
          "dinesh",
          "pradeep",
          "raj",
          "ashok",
          "priya",
          "prakash",
          "david",
          "mukesh",
          "praveen",
          "mahesh",
          "naresh",
          "anand",
          "kumar",
          "nikhil",
          "michael",
          "paul",
          "naveen",
          "nitin",
          "srinivas",
          "prasad",
          "vinod",
          "kishore",
          "james",
          "vinay",
          "thomas"
        ]
      },
      {
        "value": "LastName",
        "synonyms": [
          "Gupta",
          "Sharma",
          "Jain",
          "kumar",
          "singh",
          "mishra",
          "Mukherjee",
          "goswami",
          "verma",
          "yadav",
          "patel",
          "ghosh",
          "das"
        ]
      },
      {
        "value": "MID",
        "synonyms": [
          "M1039205",
          "M1039222",
          "M1036767",
          "M1048967",
          "M1056789",
          "M1028967",
          "M1088967"
        ]
      }
    ],
    "regex_features": [],
    "common_examples": [
      {
        "text": "my name is ankur",
        "intent": "PersonalInfo",
        "entities": [
          {
            "entity": "FirstName",
            "value": "ankur",
            "start": 11,
            "end": 15
          }
        ]
      },
      {
        "text": "my contact number is 1231234123",
        "intent": "PersonalInfo",
        "entities": [
          {
            "entity": "Contact",
            "value": "1231234123",
            "start": 21,
            "end": 30
          }
        ]
      },
      {
        "text": "my firstname is amit and lastname is gupta",
        "intent": "PersonalInfo",
        "entities": [
          {
            "entity": "FirstName",
            "value": "amit",
            "start": 16,
            "end": 19
          },
          {
            "entity": "LastName",
            "value": "gupta",
            "start": 37,
            "end": 41
          }
        ]
      },
      {
        "text": "my email is a@gmail.com",
        "intent": "PersonalInfo",
        "entities": [
          {
            "entity": "Email",
            "value": "a@gmail.com",
            "start": 12,
            "end": 22
          }
        ]
      },
      {
        "text": "kunal is one person",
        "intent": "PersonalInfo",
        "entities": [
          {
            "entity": "FirstName",
            "value": "kunal",
            "start": 0,
            "end": 4
          }
        ]
      },
      {
        "text": "myself singh and my dob comes on 24 may",
        "intent": "PersonalInfo",
        "entities": [
          {
            "entity": "LastName",
            "value": "singh",
            "start": 7,
            "end": 11
          }
        ]
      },
      {
        "text": "my name is gupta and my dob is in month april",
        "intent": "PersonalInfo",
        "entities": [
          {
            "entity": "LastName",
            "value": "gupta",
            "start": 11,
            "end": 15
          }
        ]
      },
      {
        "text": "my name is amit and my date of birth is in month of march",
        "intent": "PersonalInfo",
        "entities": [
          {
            "entity": "FirstName",
            "value": "amit",
            "start": 11,
            "end": 14
          }
        ]
      }
    ]
  }
}

推荐答案

正如警告信息所指出的,startend 可能设置不正确,导致一些要包含在标记边界(开始或结束)的空格.例如,这样的句子(来自您的 luis 模型){"text": "kunal 是一个人","intent": "个人信息",实体":[{"entity": "名字",开始位置":0,结束位置":4}]},

As the warning message points out, the start and the end have probably being incorrectly set causing some white-spaces to be included at the token boundaries (either start or the end).For example, a sentence like this (from your luis model) { "text": "kunal is one person", "intent": "PersonalInfo", "entities": [ { "entity": "FirstName", "startPos": 0, "endPos": 4 } ] },

可能(错误地)将训练数据中的 start 设为 1 并将 end 设为 5.

might(incorrectly) have the start to be 1 and end to be 5 in the training data.

也许可以尝试使用 Rasa NLU Trainer 来可视化训练数据,看看是是这样吗?

Maybe try using the Rasa NLU Trainer to visualize the training data and see is that's the case?

这也发生在我身上.更正 startend 数字修复了它.

This had happened to me too. Correcting the start and end numbers fixed it.

这篇关于如何解决 RASA NLU 中未对齐的实体注释错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 18:43
查看更多