问题描述
我一直在将 Rasa NLU 用于一个涉及理解结构化文本的项目.我的用例要求我通过添加文本语料库实体的新示例来不断更新我的训练集.然而,这意味着我必须每隔几天重新训练我的模型,由于增加了训练集的大小,因此需要更多的时间.
I've been using Rasa NLU for a project which involves making sense of structured text. My use case requires me to keep updating my training set by adding new examples of text corpus entities. However, this means that I have to keep retraining my model every few days, thereby taking more time for the same owing to increased training set size.
Rasa NLU 有没有办法更新已经训练好的模型,只用新的训练集数据训练它,而不是使用整个先前的训练数据集和新的训练数据集再次重新训练整个模型?
Is there a way in Rasa NLU to update an already trained model by only training it with the new training set data instead of retraining the entire model again using the entire previous training data set and the new training data set?
我正在尝试寻找一种方法,我可以通过每隔几天使用增量额外训练数据集对其进行训练来简单地更新我现有的训练模型.
I'm trying to look for an approach where I can simply update my existing trained model by training it with incremental additional training data set every few days.
推荐答案
迄今为止,最新的 Github 问题 在该主题上指出,无法仅添加新话语来重新训练模型.与其中引用的先前问题相同.
To date, the most recent Github issue on the topic states there is no way to retrain a model adding just the new utterances.Same for previous issues cited therein.
您是对的:必须定期重新训练越来越长的文件会越来越耗时.虽然,就地再培训在生产中并不是一个好主意.
You're right: having to retrain periodically with increasingly long files gets more and more time-consuming. Although, retraining in place is not a good idea in production.
用户评论中的优秀示例:
Excellent example in a user comment:
在同一模型上重新训练对于生产系统来说可能是一个问题.我曾经覆盖我的模型,然后在某个时候,其中一项训练没有完美运行,我开始看到我的反应信心急剧下降.我必须找出问题的根源并重新训练模型.
一直训练新模型(带有时间戳)很好,因为它使回滚更容易(并且它们会在生产系统中发生).然后我从数据库中获取最新的模型名称.
Training new model all the time (with a timestamp) is good because it makes rollbacks easier (and they will happen in production systems). I then fetch the up-to-date model names from DB.
这篇关于重新训练和更新现有的 Rasa NLU 模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!