本文介绍了它是更好地有许多小Azure存储的blob容器(有些斑点)或吨斑点的人真正大型集装箱?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以情况如下:

我有将数据写到Azure存储的BLOB的Web服务的多个实例。我需要能够组斑点成根据接收到时,它在一个容器(或虚拟目录)。在一段时间后(每天在最差)旧的斑点会得到处理,然后被删除。

I have a multiple instances of a web service that writes a blob of data to Azure Storage. I need to be able to group blobs into a container (or a virtual directory) depending on when it was received. Once in a while (every day at the worst) older blobs will get processed and then deleted.

我有两个选择:

选项1

我使一个容器被称为斑点(例如),然后储存所有博客到该容器中。每个BLOB将使用目录样式名称的目录名称是它被接收的时间(例如hr0min0 / data.bin,hr0min0 / data2.bin,hr0min30 / data3.bin,hr1min45 / data.bin ......hr23min0 / dataN.bin,等等 - 一个新的目录中所有的 X 的分钟)。该处理这些斑点会先处理hr0min0斑点的东西,然后hr0minX等(和斑点进行处理时仍然被写入)。

I make one container called "blobs" (for example) and then store all the blogs into that container. Each blob will use a directory style name with the directory name being the time it was received (e.g. "hr0min0/data.bin", "hr0min0/data2.bin", "hr0min30/data3.bin", "hr1min45/data.bin", ... , "hr23min0/dataN.bin", etc - a new directory every X minutes). The thing that processes these blobs will process hr0min0 blobs first, then hr0minX and so on (and the blobs are still being written when being processed).

选项2

我有每个基于到达时间的名称许多容器(因此首先将是一个名为blobs_hr0min0然后blobs_hr0minX容器等),并在容器中的所有斑点是到达指定时间的斑点。它处理这些博客的东西会一次处理一个集装箱。

I have many containers each with a name based on the arrival time (so first will be a container called blobs_hr0min0 then blobs_hr0minX, etc) and all the blobs in the container are those blobs that arrived at the named time. The thing that processes these blogs will process one container at a time.

所以我的问题是,哪种选择更好?是否选择2给我更好的并行(因为容器可以在不同的服务器上),或者是选择1更好,因为许多容器可能会导致其他未知的问题?

So my question is, which option is better? Does option 2 give me better parallelization (since a containers can be in different servers) or is option 1 better because many containers can cause other unknown issues?

推荐答案

我不认为它真正的问题(从可伸缩性/并行化的角度来看),因为在Win Azure的分区的BLOB存储在BLOB级别完成,而不是容器。原因为s $ P $垫出在不同的容器有更多的事情要做访问控制(如SAS)或总存储容量。

I don't think it really matters (from a scalability/parallelization perspective), because partitioning in Win Azure blobs storage is done at the blob level, not the container. Reasons to spread out across different containers have more to do with access control (e.g. SAS) or total storage size.

在这里看到更多的细节:http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx

See here for more details: http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx

(向下滚动到分区)。

引用

斑点 - 因为分区键是向下的BLOB名称,我们可以加载
  为了平衡获得跨尽可能多的服务器不同的垢
  向外扩展对它们的访问。这允许容器成长为大
  因为你需要他们(存储帐户空间限制之内)。该
  代价是,我们不提供做原子弹的能力
  跨多个斑点交易。

这篇关于它是更好地有许多小Azure存储的blob容器(有些斑点)或吨斑点的人真正大型集装箱?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-09 21:12