问题描述
我目前正在为基于网络的应用程序设计架构,该架构还应提供某种图像存储。用户将能够上传照片作为服务的关键功能之一。同时查看这些图像将成为主要用途之一(通过网页)。不过,我不确定如何在我的应用程序中实现这样一个可伸缩的图像存储组件。我已经考虑过不同的解决方案,但由于缺少经验,我期待听到您的建议。除了图像之外,还必须保存元数据。 使用像CouchDB这样的完全无模式数据库来存储图像和元数据。另外,通过使用基于HTTP的RESTful API,使用数据库本身来上传和发布。 (附加问题:CouchDB确实通过Base64保存了斑点,但是它能否以image / jpeg等形式返回数据)?
这是我最初的想法:使用(分布式)文件系统(如HDFS)并将专用的Web服务器准备为文件系统客户,以节省上传的图像和服务请求。图像元数据保存在其他数据库中,包括每个图像的文件路径信息。 在HDFS之上使用像HBase这样的面向BigTable的系统,并将图像和元数据保存在一起。再次,网络服务器桥接图像上传和请求。
因此,我们刚刚重写了我们的软件以使用CouchDB获取图像信息,而Amazon S3图像存储。该代码可在获得。
您可能希望为您的项目在现场设置兼容Amazon S3的存储服务。这使您保持灵活性,并保留亚马逊选项,而现在不需要外部服务。 似乎成为最流行和可扩展的S3克隆。
我也敦促你利用他们优秀的开放源码和产品。 可能是最着名的图片服务设置。
此外,可以成为一种灵感,尽管他们没有向公众提供开源软件,就像Livejournal一样。
I'm currently designing an architecture for a web-based application that should also provide some kind of image storage. Users will be able to upload photos as one of the key feature of the service. Also viewing these images will be one of the primary usages (via web).
However, I'm not sure how to realize such a scalable image storage component in my application. I already thought about different solutions but due to missing experiences, I look forward to hear your suggestions. Aside from the images, also meta data must besaved.Here are my initial thoughts:
Use a (distributed) filesystem like HDFS and prepare dedicated webservers as "filesystem clients" in order to save uploaded images and service requests. Image meta data are saved in a additional database including the filepath information for each image.
Use a BigTable-oriented system like HBase on top of HDFS and save images and meta data together. Again, webservers bridge image uploads and requests.
Use a completly schemaless database like CouchDB for storing both images and metadata. Additionally, use the database itself for upload and delievery by using the HTTP-based RESTful API. (Additional question: CouchDB does save blobs via Base64. Can it however return data in form of image/jpeg etc.)?
We have been using CouchDB for that, saving images as an "Attachment". But after a year the multi-dozen GB CouchDB Database files turned out to be a headache. For example CouchDB replication still has issues if you use it with very large document sizes.
So we just rewrote our software to use CouchDB for image information and Amazon S3 for the actual image storage. The code is available at http://github.com/hudora/huImages
You might want to set up a Amazon S3 compatible Storage Service on-site for your project. This keeps you flexible and leaves the amazon option without requiring external services for now. Walruss seems to become the most popular and scalable S3 clone.
I also urge you to look into the Design of Livejournal with their excellent Open Source MogileFS and Perlbal offerings. This combination is probably the most Famous image serving setup.
Also the flickr Architecture can be an inspiration, although they don't offer Open Source software to the public, like Livejournal does.
这篇关于可伸缩图像存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!