本文介绍了如何在 Dataproc 集群启动时自动安装 Python 库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在集群启动时在我的 Dataproc 集群上自动安装 Python 库?这将省去我手动登录主节点和/或工作节点手动安装我需要的库的麻烦.

How can I automatically install Python libraries on my Dataproc cluster when the cluster starts? This would save me the trouble of manually logging into the master and/or worker nodes to manually install the libraries I need.

很高兴知道这种自动化安装是否可以在主服务器上而不是在工作器上安装东西.

It would be great to also know if this automated installation could install things only on the master and not the workers.

推荐答案

初始化操作是最好的方法.初始化操作是在创建集群时运行的 shell 脚本.这将允许您自定义集群,例如安装 Python 库.这些脚本必须存储在 Google Cloud Storage 中,并且可以在通过 Google Cloud SDK 或 Google Developers Console 创建集群时使用.

Initialization actions are the best way to do this. Initialization actions are shell scripts which are run when the cluster is created. This will let you customize the cluster, such as installing Python libraries. These scripts must be stored in Google Cloud Storage and can be used when creating clusters via the Google Cloud SDK or the Google Developers Console.

这是在主节点上在创建集群时安装 Python pandas 的示例初始化操作.

Here is a sample initialization action to install the Python pandas on cluster creation only on the master node.

#!/bin/sh
ROLE=$(/usr/share/google/get_metadata_value attributes/role)
if [[ "${ROLE}" == 'Master' ]]; then
  apt-get install python-pandas -y
fi

正如您从这个脚本中看到的,可以通过 /usr/share/google/get_metadata_value attributes/role 来辨别节点的角色,然后专门对主节点执行操作(或worker) 节点.

As you can see from this script, it is possible to discern the role of a node with /usr/share/google/get_metadata_value attributes/role and then perform action specifically on the master (or worker) node.

您可以查看 Google Cloud Dataproc 文档了解更多详情

You can see the Google Cloud Dataproc Documentation for more details

这篇关于如何在 Dataproc 集群启动时自动安装 Python 库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 16:35