我目前在tomcat中运行Java Spark应用程序,并收到以下异常:

Caused by: java.io.IOException: Mkdirs failed to create file:/opt/folder/tmp/file.json/_temporary/0/_temporary/attempt_201603031703_0001_m_000000_5

在线上
text.saveAsTextFile("/opt/folder/tmp/file.json") //where text is a JavaRDD<String>
问题是/opt/folder/tmp/已经存在,并成功创建了最多达/opt/folder/tmp/file.json/_temporary/0/的文件,然后在其余部分遇到了权限问题路径_temporary/attempt_201603031703_0001_m_000000_5本身,但是我给了tomcat用户权限(chown -R tomcat:tomcat tmp/chmod -R 755 tmp/)到tmp/目录。有人知道会发生什么吗?

谢谢

编辑@javadba:
[root@ip tmp]# ls -lrta
total 12
drwxr-xr-x 4 tomcat tomcat 4096 Mar  3 16:44 ..
drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 file.json
drwxrwxrwx 3 tomcat tomcat 4096 Mar  7 20:01 .

[root@ip tmp]# cd file.json/
[root@ip file.json]# ls -lrta
total 12
drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 _temporary
drwxrwxrwx 3 tomcat tomcat 4096 Mar  7 20:01 ..
drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 .

[root@ip file.json]# cd _temporary/
[root@ip _temporary]# ls -lrta
total 12
drwxr-xr-x 2 tomcat tomcat 4096 Mar  7 20:01 0
drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 ..
drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 .

[root@ip _temporary]# cd 0/
[root@ip 0]# ls -lrta
total 8
drwxr-xr-x 3 tomcat tomcat 4096 Mar  7 20:01 ..
drwxr-xr-x 2 tomcat tomcat 4096 Mar  7 20:01 .

catalina.out中的异常
Caused by: java.io.IOException: Mkdirs failed to create file:/opt/folder/tmp/file.json/_temporary/0/_temporary/attempt_201603072001_0001_m_000000_5
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:799)
    at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
    at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:91)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1193)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1185)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    ... 1 more

最佳答案

saveAsTextFile确实由Spark执行程序处理。根据您的Spark设置,Spark执行程序可能以与您的Spark应用程序驱动程序不同的用户身份运行。我猜是spark应用程序驱动程序为该工作做好了准备的目录,但是然后,以其他用户身份运行的执行者无权在该目录中进行写操作。

更改为777将无济于事,因为权限不由子目录继承,因此无论如何您将获得755。

尝试以运行您的Spark的同一用户身份运行您的Spark应用程序。

09-05 09:23
查看更多