InternalParquetRecordWriter

InternalParquetRecordWriter

本文介绍了Spark异常:写入行时任务失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读文本文件并将其转换为实木复合地板文件。我正在使用spark代码。但是当我尝试运行代码时,出现以下异常:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ org.apache.spark.SparkException:由于阶段失败导致作业中止:阶段1.0中的任务2失败4次,最近失败:阶段1.0中丢失的任务2.3(TID 9,ukfhpdbivp12.uk.experian.local):org.apache.spark.SparkException:写入行时任务失败。
在org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $ apache $ spark $ sql $ sources $ InsertIntoHadoopFsRelation $$ writeRows $ 1(commands.scala:191)
at org.apache.spark。 sql.sources.InsertIntoHadoopFsRelation $$ anonfun $ insert $ 1.apply(commands.scala:160)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation $$ anonfun $ insert $ 1.apply(commands.scala:160 )
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:213)
在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
在java。 util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
引起:java.lang.ArithmeticException:/ by在parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:101)$零
在parquet.hadoop.InternalParquetRecordWriter。< init>(InternalParquetRecordWriter.java:94)
at parquet.hadoop.ParquetRecordWriter。< init>(ParquetRecordWriter.java:64)
at parquet。在parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
at org.apache.spark.sql.parquet.ParquetOutputWriter。< init>(newParquet.scala:83)
at org.apache.spark.sql.parquet.ParquetRelation2 $$ anon $ 4.newInstance(newParquet.scala:229)
at org.apache.spark.sql .sources.DefaultWriterContainer.initWriters(commands.scala:470)
at org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:360)
at org.apache.spark.sql .sources.InsertIntoHadoopFsRelation.org $ apache $ spark $ sql $ sources $ InsertIntoHadoopFsRelation $$ writeRows $ 1(commands.scala:172)
... 8 more

我正在努力写作数据框以下列方式:

  dataframe.write()。parquet(Path)


任何帮助都将得到高度赞赏。 另一个可能的原因是你正在达到s3请求率限制。如果你仔细看看你的日志,你可能会看到类似这样的消息:



AmazonS3Exception:请降低请求率。



虽然Spark UI会说



写入行时任务失败



我怀疑它是你遇到问题的原因,但是如果你正在运行一个高度密集的工作,它的一个可能的原因。所以我只是为了答案的完整性。


I am reading text files and converting them to parquet files. I am doing it using spark code. But when i try to run the code I get following exception

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 9, ukfhpdbivp12.uk.experian.local): org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:191)
    at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160)
    at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArithmeticException: / by zero
    at parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:101)
    at parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94)
    at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64)
    at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
    at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
    at org.apache.spark.sql.parquet.ParquetOutputWriter.<init>(newParquet.scala:83)
    at org.apache.spark.sql.parquet.ParquetRelation2$$anon$4.newInstance(newParquet.scala:229)
    at org.apache.spark.sql.sources.DefaultWriterContainer.initWriters(commands.scala:470)
    at org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:360)
    at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:172)
    ... 8 more

I am trying to write the dataframe in following fashion :

dataframe.write().parquet(Path)

Any help is highly appreciated.

解决方案

Another possible reason is that you're hitting s3 request rate limits. If you look closely at your logs you may see something like this

AmazonS3Exception: Please reduce your request rate.

While the Spark UI will say

Task failed while writing rows

I doubt its the reason you're getting an issue, but its a possible reason if you're running a highly intensive job. So I included just for answer's completeness.

这篇关于Spark异常:写入行时任务失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 04:39