本文介绍了在生产中部署 R 模型的选项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在生产中部署预测模型似乎没有太多选择,鉴于大数据的爆炸式增长,这令人惊讶.

There doesn't seem to be too many options for deploying predictive models in production which is surprising given the explosion in Big Data.

我了解开源 PMML 可用于将模型导出为 XML 规范.然后可以将其用于数据库内评分/预测.然而,似乎要完成这项工作,您需要使用 Zementis 的 PMML 插件,这意味着该解决方案并不是真正的开源.有没有更简单的开放方式将 PMML 映射到 SQL 进行评分?

I understand that the open-source PMML can be used to export models as an XML specification. This can then be used for in-database scoring/prediction. However it seems that to make this work you need to use the PMML plugin by Zementis which means the solution is not truly open source. Is there an easier open way to map PMML to SQL for scoring?

另一种选择是使用 JSON 而不是 XML 来输出模型预测.但在这种情况下,R 模型会在哪里?我假设它总是需要映射到 SQL...除非 R 模型可以与数据位于同一台服务器上,然后使用 R 脚本针对传入的数据运行?

Another option would be to use JSON instead of XML to output model predictions. But in this case, where would the R model sit? I'm assuming it would always need to be mapped to SQL...unless the R model could sit on the same server as the data and then run against that incoming data using an R script?

还有其他选择吗?

推荐答案

答案实际上取决于您的生产环境是什么.

The answer really depends on what your production environment is.

如果你的大数据"在 Hadoop 上,你可以试试这个相对较新的开源 PMML评分引擎",叫做 Pattern.

If your "big data" are on Hadoop, you can try this relatively new open source PMML "scoring engine" called Pattern.

否则您别无选择(缺少编写自定义模型特定代码),只能在您的服务器上运行 R.您可以使用 save 将拟合模型保存在 .RData 文件中,然后 load 并在服务器上运行相应的 predict.(这肯定会很慢,但您可以随时尝试投入更多硬件.)

Otherwise you have no choice (short of writing custom model-specific code) but to run R on your server. You would use save to save your fitted models in .RData files and then load and run corresponding predict on the server. (That is bound to be slow but you can always try and throw more hardware at it.)

您如何做到这一点实际上取决于您的平台.通常有一种方法可以添加用 R 编写的自定义"函数.术语是 UDF(用户定义函数).在 Hadoop 中,您可以将此类函数添加到 Pig(例如 https://github.com/cd-wood/pigaddons)或您可以使用 RHadoop 编写简单的 map-reduce 代码来加载模型并调用 predict在 R 中.如果你的数据在 Hive 中,你可以使用 Hive TRANSFORM 来调用外部 R 脚本.

How you do that really depends on your platform. Usually there is a way to add "custom" functions written in R. The term is UDF (user-defined function). In Hadoop you can add such functions to Pig (e.g. https://github.com/cd-wood/pigaddons) or you can use RHadoop to write simple map-reduce code that would load the model and call predict in R. If your data are in Hive, you can use Hive TRANSFORM to call external R script.

还有特定于供应商的方法可以将用 R 编写的函数添加到各种 SQL 数据库中.再次在文档中查找 UDF.例如,PostgreSQL 有 PL/R.

There are also vendor-specific ways to add functions written in R to various SQL databases. Again look for UDF in the documentation. For instance, PostgreSQL has PL/R.

这篇关于在生产中部署 R 模型的选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:51