This section describes how to manually calculate YARN and MapReducememory allocation settings based on the node hardware specifications.

YARN takes into account all of the available resources on eachmachine in the cluster. Based on the available resources, YARNnegotiates resource requests from applications (such as MapReduce)running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basicunit of processing capacity in YARN, and is an encapsulation of resource elements (memory, CPU, etc.).

In a Hadoop cluster, it is vital to balance the usage of memory(RAM), processors (CPU cores) and disks so that processing is notconstrained by any one of these cluster resources. As a generalrecommendation, allowing for two Containers per disk and per core gives the best balance for cluster utilization.

When determining the appropriate YARN and MapReduce memoryconfigurations for a cluster node, start with the available hardwareresources. Specifically, note the following values on each node:

  • RAM (Amount of memory) 总内存数

  • CORES (Number of CPU cores) CPU 内核数

  • DISKS (Number of disks)  硬盘数

The total available RAM for YARN and MapReduce should take intoaccount the Reserved Memory. Reserved Memory is the RAM needed by system processes and other Hadoop processes, such as HBase.

Reserved Memory = Reserved for stack memory + Reserved for HBase memory (If HBase is on the same node)

Use the following table to determine the Reserved Memory per node.

Reserved Memory Recommendations

Total Memory per NodeRecommended Reserved System MemoryRecommended Reserved HBase Memory
4 GB1 GB1 GB
8 GB2 GB1 GB
16 GB2 GB2 GB
24 GB4 GB4 GB
48 GB6 GB8 GB
64 GB8 GB8 GB
72 GB8 GB8 GB
96 GB12 GB16 GB
128 GB24 GB24 GB
256 GB32 GB32 GB
512 GB64 GB64 GB

The next calculation is to determine the maximum number of Containers allowed per node. The following formula can be used:

# of Containers = minimum of (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)

Where MIN_CONTAINER_SIZE is the minimum Container size (in RAM). This value is dependent on the amount of RAM available -- in smaller memorynodes, the minimum Container size should also be smaller. The followingtable outlines the recommended values:

Total RAM per NodeRecommended Minimum Container Size
Less than 4 GB256 MB
Between 4 GB and 8 GB512 MB
Between 8 GB and 24 GB1024 MB
Above 24 GB2048 MB

The final calculation is to determine the amount of RAM per container:

RAM-per-Container = maximum of (MIN_CONTAINER_SIZE, (Total Available RAM) / Containers))

With these calculations, the YARN and MapReduce configurations can be set:

Configuration FileConfiguration SettingValue Calculation
yarn-site.xmlyarn.nodemanager.resource.memory-mb= Containers * RAM-per-Container
yarn-site.xmlyarn.scheduler.minimum-allocation-mb= RAM-per-Container
yarn-site.xmlyarn.scheduler.maximum-allocation-mb= containers * RAM-per-Container
mapred-site.xmlmapreduce.map.memory.mb= RAM-per-Container
mapred-site.xmlmapreduce.reduce.memory.mb= 2 * RAM-per-Container
mapred-site.xmlmapreduce.map.java.opts= 0.8 * RAM-per-Container
mapred-site.xmlmapreduce.reduce.java.opts= 0.8 * 2 * RAM-per-Container
yarn-site.xml (check)yarn.app.mapreduce.am.resource.mb= 2 * RAM-per-Container
yarn-site.xml (check)yarn.app.mapreduce.am.command-opts= 0.8 * 2 * RAM-per-Container

Note: After installation, both yarn-site.xml and mapred-site.xml are located in the /etc/hadoop/conf folder.


例子

Cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks.

Reserved Memory = 6 GB reserved for system memory + (if HBase) 8 GB for HBase

Min Container size = 2 GB

If there is no HBase:

# of Containers = minimum of (2*12, 1.8* 12, (48-6)/2) = minimum of (24, 21.6, 21) = 21

RAM-per-Container = maximum of (2, (48-6)/21) = maximum of (2, 2) = 2

ConfigurationValue Calculation
yarn.nodemanager.resource.memory-mb= 21 * 2 = 42*1024 MB
yarn.scheduler.minimum-allocation-mb= 2*1024 MB
yarn.scheduler.maximum-allocation-mb= 21 * 2 = 42*1024 MB
mapreduce.map.memory.mb= 2*1024 MB
mapreduce.reduce.memory.mb= 2 * 2 = 4*1024 MB
mapreduce.map.java.opts= 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts= 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb= 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts= 0.8 * 2 * 2 = 3.2*1024 MB

If HBase is included:

# of Containers = minimum of (2*12, 1.8* 12, (48-6-8)/2) = minimum of (24, 21.6, 17) = 17

RAM-per-Container = maximum of (2, (48-6-8)/17) = maximum of (2, 2) = 2


ConfigurationValue Calculation
yarn.nodemanager.resource.memory-mb= 17 * 2 = 34*1024 MB
yarn.scheduler.minimum-allocation-mb= 2*1024 MB
yarn.scheduler.maximum-allocation-mb= 17 * 2 = 34*1024 MB
mapreduce.map.memory.mb= 2*1024 MB
mapreduce.reduce.memory.mb= 2 * 2 = 4*1024 MB
mapreduce.map.java.opts= 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts= 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb= 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts= 0.8 * 2 * 2 = 3.2*1024 MB

原文:
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html

09-21 18:21
查看更多