
EMR
***


boto.emr
========

This module provies an interface to the Elastic MapReduce (EMR)
service from AWS.

boto.emr.connect_to_region(region_name, **kw_params)

boto.emr.regions()

   Get all available regions for the Amazon Elastic MapReduce service.

   Return type:
      list

   Returns:
      A list of "boto.regioninfo.RegionInfo"


boto.emr.connection
===================

Represents a connection to the EMR service

class class boto.emr.connection.EmrConnection(aws_access_key_id=None, aws_secret_access_key=None, is_secure=True, port=None, proxy=None, proxy_port=None, proxy_user=None, proxy_pass=None, debug=0, https_connection_factory=None, region=None, path='/', security_token=None, validate_certs=True)

   APIVersion = '2009-03-31'

   DebuggingArgs = 's3n://us-east-1.elasticmapreduce/libs/state-pusher/0.1/fetch'

   DebuggingJar = 's3n://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar'

   DefaultRegionEndpoint = 'elasticmapreduce.us-east-1.amazonaws.com'

   DefaultRegionName = 'us-east-1'

   ResponseError

      alias of "EmrResponseError"

   add_instance_groups(jobflow_id, instance_groups)

      Adds instance groups to a running cluster.

      Parameters:
         * **jobflow_id** (*str*) -- The id of the jobflow which will
           take the new instance groups

         * **instance_groups** (*list(boto.emr.InstanceGroup)*) -- A
           list of instance groups to add to the job

   add_jobflow_steps(jobflow_id, steps)

      Adds steps to a jobflow

      Parameters:
         * **jobflow_id** (*str*) -- The job flow id

         * **steps** (*list(boto.emr.Step)*) -- A list of steps to add
           to the job

   describe_jobflow(jobflow_id)

      Describes a single Elastic MapReduce job flow

      Parameters:
         **jobflow_id** (*str*) -- The job flow id of interest

   describe_jobflows(states=None, jobflow_ids=None, created_after=None, created_before=None)

      Retrieve all the Elastic MapReduce job flows on your account

      Parameters:
         * **states** (*list*) -- A list of strings with job flow
           states wanted

         * **jobflow_ids** (*list*) -- A list of job flow IDs

         * **created_after** (*datetime*) -- Bound on job flow
           creation time

         * **created_before** (*datetime*) -- Bound on job flow
           creation time

   modify_instance_groups(instance_group_ids, new_sizes)

      Modify the number of nodes and configuration settings in an
      instance group.

      Parameters:
         * **instance_group_ids** (*list(str)*) -- A list of the ID's
           of the instance groups to be modified

         * **new_sizes** (*list(int)*) -- A list of the new sizes for
           each instance group

   run_jobflow(name, log_uri=None, ec2_keyname=None, availability_zone=None, master_instance_type='m1.small', slave_instance_type='m1.small', num_instances=1, action_on_failure='TERMINATE_JOB_FLOW', keep_alive=False, enable_debugging=False, hadoop_version=None, steps=[], bootstrap_actions=[], instance_groups=None, additional_info=None, ami_version=None, api_params=None, visible_to_all_users=None, job_flow_role=None)

      Runs a job flow :type name: str :param name: Name of the job
      flow

      Parameters:
         * **log_uri** (*str*) -- URI of the S3 bucket to place logs

         * **ec2_keyname** (*str*) -- EC2 key used for the instances

         * **availability_zone** (*str*) -- EC2 availability zone of
           the cluster

         * **master_instance_type** (*str*) -- EC2 instance type of
           the master

         * **slave_instance_type** (*str*) -- EC2 instance type of the
           slave nodes

         * **num_instances** (*int*) -- Number of instances in the
           Hadoop cluster

         * **action_on_failure** (*str*) -- Action to take if a step
           terminates

         * **keep_alive** (*bool*) -- Denotes whether the cluster
           should stay alive upon completion

         * **enable_debugging** (*bool*) -- Denotes whether AWS
           console debugging should be enabled.

         * **hadoop_version** (*str*) -- Version of Hadoop to use.
           This no longer

      defaults to '0.20' and now uses the AMI default.

      Parameters:
         * **steps** (*list(boto.emr.Step)*) -- List of steps to add
           with the job

         * **bootstrap_actions** (*list(boto.emr.BootstrapAction)*) --
           List of bootstrap actions that run before Hadoop starts.

         * **instance_groups** (*list(boto.emr.InstanceGroup)*) --
           Optional list of instance groups to use when creating this
           job. NB: When provided, this argument supersedes
           num_instances and master/slave_instance_type.

         * **ami_version** (*str*) -- Amazon Machine Image (AMI)
           version to use for instances. Values accepted by EMR are
           '1.0', '2.0', and 'latest'; EMR currently defaults to '1.0'
           if you don't set 'ami_version'.

         * **additional_info** (*JSON str*) -- A JSON string for
           selecting additional features

         * **api_params** (*dict*) -- a dictionary of additional
           parameters to pass directly to the EMR API (so you don't
           have to upgrade boto to use new EMR features). You can also
           delete an API parameter by setting it to None.

         * **visible_to_all_users** (*bool*) -- Whether the job flow
           is visible to all IAM users of the AWS account associated
           with the job flow. If this value is set to "True", all IAM
           users of that AWS account can view and (if they have the
           proper policy permissions set) manage the job flow. If it
           is set to "False", only the IAM user that created the job
           flow can view and manage it.

         * **job_flow_role** (*str*) -- An IAM role for the job flow.
           The EC2 instances of the job flow assume this role. The
           default role is "EMRJobflowDefault". In order to use the
           default role, you must have already created it using the
           CLI.

      Return type:
         str

      Returns:
         The jobflow id

   set_termination_protection(jobflow_id, termination_protection_status)

      Set termination protection on specified Elastic MapReduce job
      flows

      Parameters:
         * **jobflow_ids** (*list or str*) -- A list of job flow IDs

         * **termination_protection_status** (*bool*) -- Termination
           protection status

   set_visible_to_all_users(jobflow_id, visibility)

      Set whether specified Elastic Map Reduce job flows are visible
      to all IAM users

      Parameters:
         * **jobflow_ids** (*list or str*) -- A list of job flow IDs

         * **visibility** (*bool*) -- Visibility

   terminate_jobflow(jobflow_id)

      Terminate an Elastic MapReduce job flow

      Parameters:
         **jobflow_id** (*str*) -- A jobflow id

   terminate_jobflows(jobflow_ids)

      Terminate an Elastic MapReduce job flow

      Parameters:
         **jobflow_ids** (*list*) -- A list of job flow IDs


boto.emr.step
=============

class class boto.emr.step.HiveBase(name, **kw)

   BaseArgs = ['s3n://us-east-1.elasticmapreduce/libs/hive/hive-script', '--base-path', 's3n://us-east-1.elasticmapreduce/libs/hive/']

class class boto.emr.step.HiveStep(name, hive_file, hive_versions='latest', hive_args=None)

   Hive script step

class class boto.emr.step.InstallHiveStep(hive_versions='latest', hive_site=None)

   Install Hive on EMR step

   InstallHiveName = 'Install Hive'

class class boto.emr.step.InstallPigStep(pig_versions='latest')

   Install pig on emr step

   InstallPigName = 'Install Pig'

class class boto.emr.step.JarStep(name, jar, main_class=None, action_on_failure='TERMINATE_JOB_FLOW', step_args=None)

   Custom jar step

   A elastic mapreduce step that executes a jar

   Parameters:
      * **name** (*str*) -- The name of the step

      * **jar** (*str*) -- S3 URI to the Jar file

      * **main_class** (*str*) -- The class to execute in the jar

      * **action_on_failure** (*str*) -- An action, defined in the EMR
        docs to take on failure.

      * **step_args** (*list(str)*) -- A list of arguments to pass to
        the step

   args()

   jar()

   main_class()

class class boto.emr.step.PigBase(name, **kw)

   BaseArgs = ['s3n://us-east-1.elasticmapreduce/libs/pig/pig-script', '--base-path', 's3n://us-east-1.elasticmapreduce/libs/pig/']

class class boto.emr.step.PigStep(name, pig_file, pig_versions='latest', pig_args=[])

   Pig script step

class class boto.emr.step.ScriptRunnerStep(name, **kw)

   ScriptRunnerJar = 's3n://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar'

class class boto.emr.step.Step

   Jobflow Step base class

   args()

      Return type:
         list(str)

      Returns:
         List of arguments for the step

   jar()

      Return type:
         str

      Returns:
         URI to the jar

   main_class()

      Return type:
         str

      Returns:
         The main class name

class class boto.emr.step.StreamingStep(name, mapper, reducer=None, combiner=None, action_on_failure='TERMINATE_JOB_FLOW', cache_files=None, cache_archives=None, step_args=None, input=None, output=None, jar='/home/hadoop/contrib/streaming/hadoop-streaming.jar')

   Hadoop streaming step

   A hadoop streaming elastic mapreduce step

   Parameters:
      * **name** (*str*) -- The name of the step

      * **mapper** (*str*) -- The mapper URI

      * **reducer** (*str*) -- The reducer URI

      * **combiner** (*str*) -- The combiner URI. Only works for
        Hadoop 0.20 and later!

      * **action_on_failure** (*str*) -- An action, defined in the EMR
        docs to take on failure.

      * **cache_files** (*list(str)*) -- A list of cache files to be
        bundled with the job

      * **cache_archives** (*list(str)*) -- A list of jar archives to
        be bundled with the job

      * **step_args** (*list(str)*) -- A list of arguments to pass to
        the step

      * **input** (*str or a list of str*) -- The input uri

      * **output** (*str*) -- The output uri

      * **jar** (*str*) -- The hadoop streaming jar. This can be
        either a local path on the master node, or an s3:// URI.

   args()

   jar()

   main_class()


boto.emr.emrobject
==================

This module contains EMR response objects

class class boto.emr.emrobject.AddInstanceGroupsResponse(connection=None)

   Fields = set(['InstanceGroupIds', 'JobFlowId'])

class class boto.emr.emrobject.Arg(connection=None)

   endElement(name, value, connection)

class class boto.emr.emrobject.BootstrapAction(connection=None)

   Fields = set(['Path', 'Args', 'Name'])

   startElement(name, attrs, connection)

class class boto.emr.emrobject.EmrObject(connection=None)

   Fields = set([])

   endElement(name, value, connection)

   startElement(name, attrs, connection)

class class boto.emr.emrobject.InstanceGroup(connection=None)

   Fields = set(['ReadyDateTime', 'InstanceType', 'InstanceRole', 'EndDateTime', 'InstanceRunningCount', 'State', 'BidPrice', 'Market', 'StartDateTime', 'Name', 'InstanceGroupId', 'CreationDateTime', 'InstanceRequestCount', 'LastStateChangeReason', 'LaunchGroup'])

class class boto.emr.emrobject.JobFlow(connection=None)

   Fields = set(['TerminationProtected', 'MasterInstanceId', 'State', 'HadoopVersion', 'LogUri', 'AmiVersion', 'Ec2KeyName', 'ReadyDateTime', 'Type', 'JobFlowId', 'CreationDateTime', 'LastStateChangeReason', 'Name', 'EndDateTime', 'Value', 'InstanceCount', 'RequestId', 'StartDateTime', 'SlaveInstanceType', 'AvailabilityZone', 'MasterPublicDnsName', 'NormalizedInstanceHours', 'MasterInstanceType', 'VisibleToAllUsers', 'KeepJobFlowAliveWhenNoSteps', 'Id'])

   startElement(name, attrs, connection)

class class boto.emr.emrobject.KeyValue(connection=None)

   Fields = set(['Value', 'Key'])

class class boto.emr.emrobject.ModifyInstanceGroupsResponse(connection=None)

   Fields = set(['RequestId'])

class class boto.emr.emrobject.RunJobFlowResponse(connection=None)

   Fields = set(['JobFlowId'])

class class boto.emr.emrobject.Step(connection=None)

   Fields = set(['Name', 'EndDateTime', 'Jar', 'ActionOnFailure', 'State', 'MainClass', 'StartDateTime', 'CreationDateTime', 'LastStateChangeReason'])

   startElement(name, attrs, connection)
