Hadoop块大小vs分割与块大小

Hadoop块大小vs分割与块大小

本文介绍了Hadoop块大小vs分割与块大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



Hadoop Chunk size,Split size和块大小



感谢您提前致谢。

解决方案

Block size&块大小相同。 拆分大小可能与 Block / Chunk大小不同。

Map Reduce 算法在文件的物理块上不起作用。它适用于逻辑输入分割。输入拆分取决于记录的写入位置。一条记录可能跨越两个映射器。



已经设置完成,它会将非常大的文件分解为大块(例如,测量128MB),并将这些块的三个副本存储在群集中的不同节点上。 HDFS没有意识到这些文件的内容。

为了解决这个问题,Hadoop使用存储在文件块中的数据的逻辑表示,称为输入拆分。当 MapReduce 作业客户端计算 输入分割 时,它会计算块中第一个整个记录的开始位置,块结束。

在块中最后一条记录不完整的情况下,输入拆分包括下一个块的位置信息和完成所需数据的字节偏移记录。





看看这个了解更多详情



相关的SE问题:


I am little bit confused about Hadoop concepts.

What is the difference between Hadoop Chunk size , Split size and Block size?

Thanks in advance.

解决方案

Block size & Chunk Size are same. Split size may be different to Block/Chunk size.

Map Reduce algorithm does not work on physical blocks of the file. It works on logical input splits. Input split depends on where the record was written. A record may span two mappers.

The way HDFS has been set up, it breaks down very large files into large blocks (for example, measuring 128MB), and stores three copies of these blocks on different nodes in the cluster. HDFS has no awareness of the content of these files.

To solve this problem, Hadoop uses a logical representation of the data stored in file blocks, known as input splits. When a MapReduce job client calculates the input splits, it figures out where the first whole record in a block begins and where the last record in the block ends.

In cases where the last record in a block is incomplete, the input split includes location information for the next block and the byte offset of the data needed to complete the record.

Have a look at this article for more details.

Related SE questions:

About Hadoop/HDFS file splitting

Split size vs Block size in Hadoop

这篇关于Hadoop块大小vs分割与块大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 02:43