问题描述
在生产中部署预测模型似乎没有太多的选择,令人惊讶的是,由于Big Data的爆炸式增长。
There doesn't seem to be too many options for deploying predictive models in production which is surprising given the explosion in Big Data.
我明白,开源的PMML可以用于将模型导出为XML规范。这可以用于数据库内的评分/预测。然而,似乎要做这项工作,您需要使用Zementis的PMML插件,这意味着该解决方案不是真正的开源。有没有更简单的打开方式将PMML映射到SQL进行评分?
I understand that the open-source PMML can be used to export models as an XML specification. This can then be used for in-database scoring/prediction. However it seems that to make this work you need to use the PMML plugin by Zementis which means the solution is not truly open source. Is there an easier open way to map PMML to SQL for scoring?
另一个选择是使用JSON而不是XML来输出模型预测。但在这种情况下,R模型坐在哪里?我假设总是需要映射到SQL ...除非R模型可以与数据位于同一台服务器上,然后使用R脚本运行该传入数据。
Another option would be to use JSON instead of XML to output model predictions. But in this case, where would the R model sit? I'm assuming it would always need to be mapped to SQL...unless the R model could sit on the same server as the data and then run against that incoming data using an R script?
其他任何选项呢?
推荐答案
答案的确取决于你的生产环境。
The answer really depends on what your production environment is.
如果您的大数据在Hadoop上,您可以尝试这个相对较新的开源PMML评分引擎,名为。
If your "big data" are on Hadoop, you can try this relatively new open source PMML "scoring engine" called Pattern.
否则,您无法选择(编写自定义模型特定代码),而是在您的服务器。您将使用保存
将您的拟合模型保存在.RData文件中,然后加载
并运行相应的在服务器上预测
。 (这一定很慢,但是你总是可以尝试投掷更多的硬件。)
Otherwise you have no choice (short of writing custom model-specific code) but to run R on your server. You would use save
to save your fitted models in .RData files and then load
and run corresponding predict
on the server. (That is bound to be slow but you can always try and throw more hardware at it.)
你的做法真的取决于你的平台。通常有一种方法可以添加用R编写的自定义函数。术语是UDF(用户自定义函数)。在Hadoop中,您可以向Pig添加这些功能(例如)或您可以使用编写简单的map-reduce代码,以加载模型并调用预测
。如果您的数据在Hive中,您可以使用。
There are also vendor-specific ways to add functions written in R to various SQL databases. Again look for UDF in the documentation. For instance, PostgreSQL has PL/R.
这篇关于在生产中部署R模型的选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!