问题描述
我正在阅读文本文件并将其转换为实木复合地板文件。我正在使用spark代码。但是当我尝试运行代码时,出现以下异常:
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ org.apache.spark.SparkException:由于阶段失败导致作业中止:阶段1.0中的任务2失败4次,最近失败:阶段1.0中丢失的任务2.3(TID 9,ukfhpdbivp12.uk.experian.local):org.apache.spark.SparkException:写入行时任务失败。
在org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org $ apache $ spark $ sql $ sources $ InsertIntoHadoopFsRelation $$ writeRows $ 1(commands.scala:191)
at org.apache.spark。 sql.sources.InsertIntoHadoopFsRelation $$ anonfun $ insert $ 1.apply(commands.scala:160)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation $$ anonfun $ insert $ 1.apply(commands.scala:160 )
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:213)
在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
在java。 util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
引起:java.lang.ArithmeticException:/ by在parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:101)$零
在parquet.hadoop.InternalParquetRecordWriter。< init>(InternalParquetRecordWriter.java:94)
at parquet.hadoop.ParquetRecordWriter。< init>(ParquetRecordWriter.java:64)
at parquet。在parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
at org.apache.spark.sql.parquet.ParquetOutputWriter。< init>(newParquet.scala:83)
at org.apache.spark.sql.parquet.ParquetRelation2 $$ anon $ 4.newInstance(newParquet.scala:229)
at org.apache.spark.sql .sources.DefaultWriterContainer.initWriters(commands.scala:470)
at org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:360)
at org.apache.spark.sql .sources.InsertIntoHadoopFsRelation.org $ apache $ spark $ sql $ sources $ InsertIntoHadoopFsRelation $$ writeRows $ 1(commands.scala:172)
... 8 more
我正在努力写作数据框以下列方式:
dataframe.write()。parquet(Path)
任何帮助都将得到高度赞赏。 另一个可能的原因是你正在达到s3请求率限制。如果你仔细看看你的日志,你可能会看到类似这样的消息:
AmazonS3Exception:请降低请求率。
虽然Spark UI会说
写入行时任务失败
我怀疑它是你遇到问题的原因,但是如果你正在运行一个高度密集的工作,它的一个可能的原因。所以我只是为了答案的完整性。
I am reading text files and converting them to parquet files. I am doing it using spark code. But when i try to run the code I get following exception
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 9, ukfhpdbivp12.uk.experian.local): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:191) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArithmeticException: / by zero at parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:101) at parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94) at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282) at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252) at org.apache.spark.sql.parquet.ParquetOutputWriter.<init>(newParquet.scala:83) at org.apache.spark.sql.parquet.ParquetRelation2$$anon$4.newInstance(newParquet.scala:229) at org.apache.spark.sql.sources.DefaultWriterContainer.initWriters(commands.scala:470) at org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:360) at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:172) ... 8 more
I am trying to write the dataframe in following fashion :
dataframe.write().parquet(Path)
Any help is highly appreciated.
解决方案Another possible reason is that you're hitting s3 request rate limits. If you look closely at your logs you may see something like this
AmazonS3Exception: Please reduce your request rate.
While the Spark UI will say
Task failed while writing rows
I doubt its the reason you're getting an issue, but its a possible reason if you're running a highly intensive job. So I included just for answer's completeness.
这篇关于Spark异常:写入行时任务失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!