问题描述
我有一个简单的Spark结构化流应用程序,可以从Kafka读取并写入HDFS.如今,该应用神秘地停止了工作,没有任何更改或修改(它已经连续数周无故障运行).
I have a simple Spark Structured Streaming app that reads from Kafka and writes to HDFS. Today the app has mysteriously stopped working, with no changes or modifications whatsoever (it had been working flawlessly for weeks).
到目前为止,我已经观察到以下内容:
So far, I have observed the following:
- 应用没有活动,失败或已完成的任务
- 应用程序用户界面未显示任何工作,也没有阶段
- QueryProgress指示每个触发器0个输入行
- QueryProgress表示与Kafka的偏移量已已正确读取并正确提交(这意味着数据确实存在)
- 该主题中的数据确实可用(向控制台写入即显示数据)
- App has no active, failed or completed tasks
- App UI shows no jobs and no stages
- QueryProgress indicates 0 input rows every trigger
- QueryProgress indicates offsets from Kafka were read and committed correctly (which means data is actually there)
- Data is indeed available in the topic (writing to console shows the data)
尽管如此,HDFS不再写入任何内容.代码段:
Despite all of that, nothing is being written to HDFS anymore. Code snippet:
val inputData = spark
.readStream.format("kafka")
.option("kafka.bootstrap.servers", bootstrap_servers)
.option("subscribe", topic-name-here")
.option("startingOffsets", "latest")
.option("failOnDataLoss", "false").load()
inputData.toDF()
.repartition(10)
.writeStream.format("parquet")
.option("checkpointLocation", "hdfs://...")
.option("path", "hdfs://...")
.outputMode(OutputMode.Append())
.trigger(Trigger.ProcessingTime("60 seconds"))
.start()
为什么UI不会显示任何作业/任务?
Any ideas why the UI shows no jobs/tasks?
推荐答案
对于面临相同问题的任何人:我找到了罪魁祸首:
For anyone facing the same issue: I found the culprit:
以某种方式,我保存数据的HDFS目录中的 _spark_metadata 中的数据被损坏.
Somehow the data within _spark_metadata in the HDFS directory where I was saving the data got corrupted.
解决方案是删除该目录并重新启动应用程序,从而重新创建目录.数据之后,数据开始流动.
The solution was to erase that directory and restart the application, which re-created the directory. After data, data started flowing.
这篇关于Spark Structured Streaming应用程序没有工作,也没有阶段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!