问题描述
在我读过的各种博客中,我了解到HDFS是存在于计算机本地文件系统中的另一层。
我也安装了hadoop,但我无法理解在本地文件系统中存在hdfs图层。
这是我的问题..
考虑我安装hadoop以伪分布模式。在安装过程中发生了什么?我在配置文件中添加了一个tmp.dir参数。当它试图访问数据节点时,是namenode守护进程与之对话的单个文件夹?
确定..let我试了一下。当你配置Hadoop时,它会在你的本地FS(HDFS)之上放置一个虚拟的FS。 HDFS以块的形式存储数据(类似于本地FS,但与其相比要大得多)以复制的方式存储。但是HDFS目录树或文件系统名称空间与本地FS相同。当您开始将数据写入HDFS时,它最终会被写入本地FS,但您无法直接在此处看到它。
临时目录实际上有3个目的:
$ b 1- namenode存储其元数据的目录,默认值为 $ {hadoop.tmp.dir} / dfs / name
,并可以通过 dfs.name.dir
显式指定。如果指定了dfs.name.dir,那么namenode metedata将被存储在作为此属性值给出的目录中。
2- HDFS数据块所在的目录存储,默认值 $ {hadoop.tmp.dir} / dfs / data
,可以通过 dfs.data.dir
。如果指定了dfs.data.dir,那么HDFS数据将被存储在作为此属性值给出的目录中。
3-辅助名称节点存储它的目录检查点,缺省值是 $ {hadoop.tmp.dir} / dfs / namesecondary
,并且可以通过 fs.checkpoint.dir
因此,使用一些适当的专用位置作为这些属性的值以更清洁设置总是更好。
当需要访问特定的数据块时,将搜索存储在dfs.name.dir目录中的元数据,并将该块在特定数据节点上的位置返回给客户端(在dfs中的某个位置本地FS上的.data.dir目录)。然后客户端直接从那里读取数据(同样适用于写入)。
这里需要注意的一点是HDFS不是物理FS。这是一个虚拟的抽象概念,不能像本地FS那样浏览本地FS。您需要使用HDFS shell或HDFS webUI或可用的API。
HTH
From various blogs I read, I comprehended that HDFS is another layer that exists over Local filesystem in a computer.
I also installed hadoop but I have trouble understanding the existence of hdfs layer over local file system.
Here is my question..
Consider I am installing hadoop in pseudo-distributed mode. What happens under the hood during this installation? I added a tmp.dir parameter in configuration files. Is is the single folder to which namenode daemon talks to, when it attemps to access the datanode??
OK..let me give it a try..When you configure Hadoop it lays down a virtual FS on top of your local FS, which is the HDFS. HDFS stores data as blocks(similar to the local FS, but much much bigger as compared to it) in a replicated fashion. But the HDFS directory tree or the filesystem namespace is identical to that of local FS. When you start writing data into HDFS, it eventually gets written onto the local FS only, but you can't see it there directly.
The temp directory actually serves 3 purposes :
1- Directory where namenode stores its metadata, with default value ${hadoop.tmp.dir}/dfs/name
and can be specified explicitly by dfs.name.dir
. If you specify dfs.name.dir, then the namenode metedata will be stored in the directory given as the value of this property.
2- Directory where HDFS data blocks are stored, with default value ${hadoop.tmp.dir}/dfs/data
and can be specified explicitly by dfs.data.dir
. If you specify dfs.data.dir, then the HDFS data will be stored in the directory given as the value of this property.
3- Directory where secondary namenode store its checkpoints, default value is ${hadoop.tmp.dir}/dfs/namesecondary
and can be specified explicitly by fs.checkpoint.dir
.
So, it's always better to use some proper dedicated location as the values for these properties for a cleaner setup.
When access to a particular block of data is required metadata stored in the dfs.name.dir directory is searched and the location of that block on a particular datanode is returned to the client(which is somewhere in dfs.data.dir directory on the local FS). The client then reads data directly from there (same holds good for writes as well).
One important point to note here is that HDFS is not a physical FS. It is rather a virtual abstraction on top of your local FS which can't be browsed simply like the local FS. You need to use the HDFS shell or the HDFS webUI or the available APIs to do that.
HTH
这篇关于HDFS vs LFS - Hadoop Dist。文件系统建立在本地文件系统上?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!