hadoop - 当块大小为512MB时，可以给mapreduce程序多少输入文件夹的大小

我正在使用4GB RAM。我在hdfs-site.xml中分配了512MB的块大小，并且正在使用CombineFileSplit输入格式，其最大拆分大小为536870912字节(512MB)。然后我可以将输入文件夹的大小分配给mapreduce程序，以便它可以平稳运行，而不会出现内存不足的异常。

任何人都可以在这个问题上提出建议...

最佳答案

容器的数量为，取决于块大小的数量。如果您有2 GB的数据(块大小为512 mb)，则 Yarn创建4个映射，并减少1个。在运行mapreduce时，我们应遵循一些规则来提交mapreduce作业。(这适用于小型集群)

您应该在RAM DISK和CORES中配置以下属性。

<property>
    <description>The minimum allocation for every container request at the RM,
    in MBs. Memory requests lower than this won't take effect,
    and the specified value will get allocated at minimum.</description>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>512</value>
  </property>

  <property>
    <description>The maximum allocation for every container request at the RM,
    in MBs. Memory requests higher than this won't take effect,
    and will get capped to this value.</description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
  </property>


 <property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>2048</value>
 </property>

并根据“内存资源”设置Java堆大小。
一旦在 yarn-site.xml 中使用上述属性来确保，mapreduce将会成功完成。

关于hadoop - 当块大小为512MB时，可以给mapreduce程序多少输入文件夹的大小，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/30364977/