本文介绍了如何使用 boto 启动和配置 EMR 集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 boto 启动集群并运行作业.我发现了很多创建 job_flows 的例子.但我不能为我的生活,找到一个例子来说明:

I'm trying to launch a cluster and run a job all using boto.I find lot's of examples of creating job_flows. But I can't for the life of me, find an example that shows:

  1. 如何定义要使用的集群(通过 clusted_id)
  2. 如何配置启动集群(例如,如果我想为某些任务节点使用 Spot 实例)

我错过了什么吗?

推荐答案

Boto 和底层 EMR API 目前正在混合使用术语clusterjob flow,以及 job flow正在弃用.我认为它们是同义词.

Boto and the underlying EMR API is currently mixing the terms cluster and job flow, and job flow is being deprecated. I consider them synonyms.

您可以通过调用 boto.emr.connection.run_jobflow() 函数来创建新集群.它将返回 EMR 为您生成的集群 ID.

You create a new cluster by calling the boto.emr.connection.run_jobflow() function. It will return the cluster ID which EMR generates for you.

首先是所有必须的东西:

First all the mandatory things:

#!/usr/bin/env python

import boto
import boto.emr
from boto.emr.instance_group import InstanceGroup

conn = boto.emr.connect_to_region('us-east-1')

然后我们指定实例组,包括我们要为 TASK 节点支付的现货价格:

Then we specify instance groups, including the spot price we want to pay for the TASK nodes:

instance_groups = []
instance_groups.append(InstanceGroup(
    num_instances=1,
    role="MASTER",
    type="m1.small",
    market="ON_DEMAND",
    name="Main node"))
instance_groups.append(InstanceGroup(
    num_instances=2,
    role="CORE",
    type="m1.small",
    market="ON_DEMAND",
    name="Worker nodes"))
instance_groups.append(InstanceGroup(
    num_instances=2,
    role="TASK",
    type="m1.small",
    market="SPOT",
    name="My cheap spot nodes",
    bidprice="0.002"))

最后我们开始一个新的集群:

Finally we start a new cluster:

cluster_id = conn.run_jobflow(
    "Name for my cluster",
    instance_groups=instance_groups,
    action_on_failure='TERMINATE_JOB_FLOW',
    keep_alive=True,
    enable_debugging=True,
    log_uri="s3://mybucket/logs/",
    hadoop_version=None,
    ami_version="2.4.9",
    steps=[],
    bootstrap_actions=[],
    ec2_keyname="my-ec2-key",
    visible_to_all_users=True,
    job_flow_role="EMR_EC2_DefaultRole",
    service_role="EMR_DefaultRole")

如果我们关心这个,我们也可以打印集群 ID:

We can also print the cluster ID if we care about that:

print "Starting cluster", cluster_id

这篇关于如何使用 boto 启动和配置 EMR 集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 20:25