问题描述
我正在玩Livy/Spark,对如何使用其中的一些东西有些困惑.在livy examples文件夹中有一个示例,该示例中的建筑作业已上传到Spark.我喜欢正在使用的接口,但是我想通过http接口到livy/spark,因为我没有Java客户端.这样看来,如果我使用livyclient上传jar,它仅存在于该spark会话中.有没有一种方法可以将livyjobs上传到spark,然后在所有spark中保持不变?改为使这些作业/应用程序变得更好吗?
I'm playing around with Livy/Spark and am a little confused on how to use some of it. There's an example in the livy examples folder of building jobs that get uploaded to spark. I like the interfaces that are being used, but I want to interface to livy/spark via http as I don't have a java client. With that it seems that if I use the livyclient to upload jars it only exists within that spark session. Is there a way to upload livyjobs to spark and then have that be persistent across all of spark? Would it be better to make those jobs/apps in spark instead?
老实说,我正在尝试找出最好的方法.我希望能够通过外壳进行交互操作,但是我也想为那些我经常使用的火花中不可用的算法进行自定义作业.我不确定应该采取什么方式解决这个问题.有什么想法吗?我应该如何使用Livy?就像其他服务会触发,然后处理在Spark中构建自定义应用/方法一样?
Honestly I'm trying to figure out what the best approach would be. I want to be able to do interactive things via the shell, but I also want to make custom jobs for algorithms not available in spark that I would use frequently. I'm not sure what way I should tackle this. Any thoughts? How should I be using Livy? Just as the rest service to spark and then handle building custom apps/methods in spark?
例如:
说我有一些javascript应用程序,并且我有一些可以加载的数据,并且我想在上面运行算法x.算法x是在Spark中实现的还是未实现的,但是通过按下该按钮,我希望将数据导入spark,无论是将其放入hdfs还是从elasticsearch或其他方式中获取.如果我有livy,我想在livy中调用一些rest命令来执行此操作,然后它运行该特定算法.这样做的标准方法是什么?
Say I have some javascript application, and I have some data I can load, and I want to run algorithm x on it. algorithm x is or isn't implemented in spark, but by pressing that button I want to get that data into spark, be it getting put into hdfs or pulled from elasticsearch or whatever. If I have livy I'd want to call some rest command in livy to do that and it then runs that particular algorithm. What's the standard way of doing this?
谢谢
推荐答案
Livy目前不支持文件上传.您必须为会话或批处理作业提供有效的文件路径.这些文件必须在HDFS中.因此,通常,您可以将脚本或文件保留在HDFS中,然后使用Livy启动引用这些文件的批处理/交互式作业.
Livy doesn't support file uploads, yet. You have to provide valid file paths for sessions or batch jobs. These files have to be in HDFS.So, mainly you can keep your scripts or files in HDFS and then use Livy to launch a Batch/Interactive job referencing those files.
Livy- Cloudera
Livy- Apache
编辑:Apache正在孵化Livy,他们计划添加新的API以支持资源上传.选中此.
Livy is being incubated by Apache and they are planning to add a new API to support resource uploading. Check this.
这篇关于通过http运行livy作业,而无需每次都上传jar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!