如何启动和使用博托配置EMR集群

如何启动和使用博托配置EMR集群

本文介绍了如何启动和使用博托配置EMR集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想推出一个集群,并运行一个作业全部采用博托。我发现很多的的创建job_flows的例子。但我不能为我的生活,找到一个例子,显示了:

I'm trying to launch a cluster and run a job all using boto.I find lot's of examples of creating job_flows. But I can't for the life of me, find an example that shows:

  1. 如何定义集群中使用(由clusted_id)
  2. 如何配置启动群集(例如,如果我想使用现货实例对一些任务节点)

我缺少的东西?

推荐答案

博托和基础EMR API目前混合条件的的群集作业流程的,和工作流程是是<一个href="http://docs.aws.amazon.com/ElasticMa$p$pduce/latest/API/API_DescribeJobFlows.html">de$p$pcated.我认为他们的同义词。

Boto and the underlying EMR API is currently mixing the terms cluster and job flow, and job flow is being deprecated. I consider them synonyms.

您创建一个新的群集通过调用 boto.emr.connection.run_jobflow()功能。它会返回集群ID,它的电子病历为您生成。

You create a new cluster by calling the boto.emr.connection.run_jobflow() function. It will return the cluster ID which EMR generates for you.

首先强制事情:

#!/usr/bin/env python

import boto
import boto.emr
from boto.emr.instance_group import InstanceGroup

conn = boto.emr.connect_to_region('us-east-1')

然后,我们指定实例群体,包括我们要付出的任务节点的现货价格:

Then we specify instance groups, including the spot price we want to pay for the TASK nodes:

instance_groups = []
instance_groups.append(InstanceGroup(
    num_instances=1,
    role="MASTER",
    type="m1.small",
    market="ON_DEMAND",
    name="Main node"))
instance_groups.append(InstanceGroup(
    num_instances=2,
    role="CORE",
    type="m1.small",
    market="ON_DEMAND",
    name="Worker nodes"))
instance_groups.append(InstanceGroup(
    num_instances=2,
    role="TASK",
    type="m1.small",
    market="SPOT",
    name="My cheap spot nodes",
    bidprice="0.002"))

最后,我们开始了新的集群:

Finally we start a new cluster:

cluster_id = conn.run_jobflow(
    "Name for my cluster",
    instance_groups=instance_groups,
    action_on_failure='TERMINATE_JOB_FLOW',
    keep_alive=True,
    enable_debugging=True,
    log_uri="s3://mybucket/logs/",
    hadoop_version=None,
    ami_version="2.4.9",
    steps=[],
    bootstrap_actions=[],
    ec2_keyname="my-ec2-key",
    visible_to_all_users=True,
    job_flow_role="EMR_EC2_DefaultRole",
    service_role="EMR_DefaultRole")

我们还可以打印集群ID,如果我们关心的是:

We can also print the cluster ID if we care about that:

print "Starting cluster", cluster_id

这篇关于如何启动和使用博托配置EMR集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 07:19