问题描述
我正在尝试使用 boto 启动集群并运行作业.我发现了很多创建 job_flows 的例子.但我不能为我的生活,找到一个例子来说明:
I'm trying to launch a cluster and run a job all using boto.I find lot's of examples of creating job_flows. But I can't for the life of me, find an example that shows:
- 如何定义要使用的集群(通过 clusted_id)
- 如何配置启动集群(例如,如果我想为某些任务节点使用 Spot 实例)
我错过了什么吗?
推荐答案
Boto 和底层 EMR API 目前正在混合使用术语cluster 和 job flow,以及 job flow正在弃用.我认为它们是同义词.
Boto and the underlying EMR API is currently mixing the terms cluster and job flow, and job flow is being deprecated. I consider them synonyms.
您可以通过调用 boto.emr.connection.run_jobflow()
函数来创建新集群.它将返回 EMR 为您生成的集群 ID.
You create a new cluster by calling the boto.emr.connection.run_jobflow()
function. It will return the cluster ID which EMR generates for you.
首先是所有必须的东西:
First all the mandatory things:
#!/usr/bin/env python
import boto
import boto.emr
from boto.emr.instance_group import InstanceGroup
conn = boto.emr.connect_to_region('us-east-1')
然后我们指定实例组,包括我们要为 TASK 节点支付的现货价格:
Then we specify instance groups, including the spot price we want to pay for the TASK nodes:
instance_groups = []
instance_groups.append(InstanceGroup(
num_instances=1,
role="MASTER",
type="m1.small",
market="ON_DEMAND",
name="Main node"))
instance_groups.append(InstanceGroup(
num_instances=2,
role="CORE",
type="m1.small",
market="ON_DEMAND",
name="Worker nodes"))
instance_groups.append(InstanceGroup(
num_instances=2,
role="TASK",
type="m1.small",
market="SPOT",
name="My cheap spot nodes",
bidprice="0.002"))
最后我们开始一个新的集群:
Finally we start a new cluster:
cluster_id = conn.run_jobflow(
"Name for my cluster",
instance_groups=instance_groups,
action_on_failure='TERMINATE_JOB_FLOW',
keep_alive=True,
enable_debugging=True,
log_uri="s3://mybucket/logs/",
hadoop_version=None,
ami_version="2.4.9",
steps=[],
bootstrap_actions=[],
ec2_keyname="my-ec2-key",
visible_to_all_users=True,
job_flow_role="EMR_EC2_DefaultRole",
service_role="EMR_DefaultRole")
如果我们关心这个,我们也可以打印集群 ID:
We can also print the cluster ID if we care about that:
print "Starting cluster", cluster_id
这篇关于如何使用 boto 启动和配置 EMR 集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!