本文介绍了在Spark作业完成且上下文已关闭后,如何查看其日志?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行pysparkspark 1.3standalone modeclient mode.

我试图通过查看过去的工作并进行比较来调查我的火花工作.我想查看他们的日志,提交作业的配置设置,等等.但是在上下文关闭后,我在查看作业日志时遇到了麻烦.

I am trying to investigate my spark job by looking at the jobs from the past and comparing them. I want to view their logs, the configuration settings under which the jobs were submitted, etc. But I'm running into trouble viewing the logs of jobs after the context is closed.

当我提交工作时,我当然会打开一个火花上下文.在作业运行期间,我可以使用ssh隧道打开 spark web UI .而且,我可以通过localhost:<port no>访问转发的端口.然后,我可以查看当前正在运行的作业以及已完成的作业,如下所示:

When I submit a job, of course I open a spark context. While the job is running, I'm able to open the spark web UI using ssh tunneling. And, I can access the forwarded port by localhost:<port no>. Then I can view the jobs currently running, and the ones that are completed, like this:

然后,如果我希望查看特定作业的日志,可以使用ssh隧道端口转发来查看该作业在特定计算机上的特定端口上的日志.

Then, if I wish to see the logs of a particular job, I can do so by using ssh tunnel port forwarding to see the logs on a particular port for a particular machine for that job.

然后,有时作业会失败,但是上下文仍处于打开状态.发生这种情况时,我仍然仍然可以通过上述方法查看日志.

Then, sometimes the job fails, but the context is still open. When this happens, I am still able to see the logs by the above method.

但是,由于我不想一次打开所有这些上下文,因此当作业失败时,我关闭了上下文.当我关闭上下文时,该作业将出现在上图中的已完成的应用程序"下.现在,当我尝试像以前一样使用ssh隧道端口转发查看日志时(localhost:<port no>),它给了我page not found.

But, since I don't want to have all of these contexts open at once, when the job fails, I close the context. When I close the context, the job appears under "Completed Applications" in the image above. Now, when I try to view the logs by using ssh tunnel port forwarding, as before (localhost:<port no>), it gives me a page not found.

关闭上下文后,如何查看作业的日志?而且,这暗示着spark context与日志保存位置之间的关系是什么?谢谢.

How do I view the logs of a job after the context is closed? And, what does this imply about the relationship between the spark context and where the logs are kept? Thank you.

同样,我正在运行pysparkspark 1.3standalone modeclient mode.

Again, I am running pyspark, spark 1.3, standalone mode, client mode.

推荐答案

火花事件日志/历史记录服务器用于此用例.

Spark event log / history-server is for this use case.

如果conf/spark-default.conf不存在

cp conf/spark-defaults.conf.template conf/spark-defaults.conf

将以下配置添加到conf/spark-default.conf.

# This is to enabled event log
spark.eventLog.enabled  true

// this is where to store event log
spark.eventLog.dir file:///Users/rockieyang/git/spark/spark-events

// this is tell history server where to get event log
spark.history.fs.logDirectory file:///Users/rockieyang/git/spark/spark-events

历史记录服务器

启动历史记录服务器

History server

start history server

sbin/start-history-server.sh

检查历史记录,默认端口为18080

check history, by default the port is 18080

http://localhost:18080/

这篇关于在Spark作业完成且上下文已关闭后,如何查看其日志?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 00:07