如何使用特定用户初始化 spark shell 以通过 apache spark 将数据保存到 hdfs

本文介绍了如何使用特定用户初始化 spark shell 以通过 apache spark 将数据保存到 hdfs的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 ubuntu
我使用 Intellij 使用 spark 依赖
未找到命令spark"，但可以通过以下方式安装:..(当我在 shell 中输入 spark 时)
我有两个用户 amine 和 hadoop_amine(设置了 hadoop hdfs)

im using ubuntu
im using spark dependency using intellij
Command 'spark' not found, but can be installed with: .. (when i enter spark in shell)
i have two user amine , and hadoop_amine (where hadoop hdfs is set)

当我尝试将数据帧保存到 HDFS (spark scala) 时:

when i try to save a dataframe to HDFS (spark scala):

procesed.write.format("json").save("hdfs://localhost:54310/mydata/enedis/POC/processed.json")

我遇到了这个错误

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/mydata/enedis/POC":hadoop_amine:supergroup:drwxr-xr-x

推荐答案

尝试更改 HDFS 目录的权限或简单地更改您的 spark 用户！要更改目录权限，您可以像这样使用 hdfs 命令行

Try to change the permissions of the HDFS directory or change your spark user simply!For changing the directory permission you can use hdfs command line like this

hdfs dfs -chmod  ...

在 spark-submit 中，您可以使用 proxy-user 选项最后，您可以使用正确的用户运行 spark-submit 或 spark-shell，如下命令:

In spark-submit you can use the proxy-user optionAnd at last, you can run the spark-submit or spark-shell with the proper user like this command:

sudo -u hadoop_amine spark-submit ...

这篇关于如何使用特定用户初始化 spark shell 以通过 apache spark 将数据保存到 hdfs的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！