问题描述
这是一种天真的问题,但我是NoSQL范式的新手,对其不太了解。所以如果有人能够帮助我清楚地理解HBase和Hadoop之间的差异,或者给出一些可能帮助我理解差异的指针。
$ b $ p到目前为止,我做了一些研究和ACC。根据我的理解,Hadoop提供的框架可以在HDFS中使用原始数据块(文件),而HBase则是Hadoop之上的数据库引擎,它基本上可以与结构化数据而不是原始数据块一起工作。与SQL一样,Hbase提供了一个HDFS逻辑层。是否正确?
请随时纠正我。
谢谢。
$ b $ Hadoop基本上是3件事,一个FS(Hadoop分布式文件系统),一个计算框架(MapReduce)和一个管理桥(另一个资源谈判器)。 HDFS允许您在分布式(提供更快的读/写访问)和冗余(提供更好的可用性)方式中存储大量数据。 MapReduce允许您以分布式和并行的方式处理这些庞大的数据。但是MapReduce并不仅限于HDFS。作为FS,HDFS缺乏随机读写能力。顺序数据访问很有用。这就是HBase进入图片的地方。它是一个NoSQL数据库,可以在Hadoop集群上运行,并为您提供对数据的随机实时读/写访问。您可以存储结构化数据和非结构化数据在Hadoop和HBase中也是如此。它们都为您提供多种访问数据的机制,例如shell和其他API。
而且,HBase以列状形式将数据存储为键/值对,而HDFS将数据存储为平面文件。这两个系统的一些显着特性是:
Hadoop
- 针对大型文件的流式访问进行了优化。
- 遵循一次写入式多读意识形态。
- 不支持随机阅读/写入。
HBase
- 以列方式存储键/值对(列作为列族聚合在一起)。
- 提供对大量数据中的少量数据的低延迟访问。
- 提供灵活的数据模型。$ b $ b
- Optimized for streaming access of large files.
- Follows write-once read-many ideology.
- Doesn't support random read/write.
- Stores key/value pairs in columnar fashion (columns are clubbed together as column families).
- Provides low latency access to small amounts of data from within a large data set.
- Provides flexible data model.
Hadoop最适合脱机批处理有点东西,而HBase用于实时需求。
MySQL和Ext4之间的比较类似。
This is kind of naive question but I am new to NoSQL paradigm and don't know much about it. So if somebody can help me clearly understand difference between the HBase and Hadoop or if give some pointers which might help me understand the difference.
Till now, I did some research and acc. to my understanding Hadoop provides framework to work with raw chunk of data(files) in HDFS and HBase is database engine above Hadoop, which basically works with structured data instead of raw data chunk. Hbase provides a logical layer over HDFS just as SQL does. Is it correct?
Pls feel free to correct me.
Thanks.
Hadoop is basically 3 things, a FS (Hadoop Distributed File System), a computation framework (MapReduce) and a management bridge (Yet Another Resource Negotiator). HDFS allows you store huge amounts of data in a distributed (provides faster read/write access) and redundant (provides better availability) manner. And MapReduce allows you to process this huge data in a distributed and parallel manner. But MapReduce is not limited to just HDFS. Being a FS, HDFS lacks the random read/write capability. It is good for sequential data access. And this is where HBase comes into picture. It is a NoSQL database that runs on top your Hadoop cluster and provides you random real-time read/write access to your data.
You can store both structured and unstructured data in Hadoop, and HBase as well. Both of them provide you multiple mechanisms to access the data, like the shell and other APIs.And, HBase stores data as key/value pairs in a columnar fashion while HDFS stores data as flat files. Some of the salient features of both the systems are :
Hadoop
HBase
Hadoop is most suited for offline batch-processing kinda stuff while HBase is used when you have real-time needs.
An analogous comparison would be between MySQL and Ext4.
这篇关于HBase与Hadoop / HDFS的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!