问题描述
我有以下Azure存储表.
I have the following Azure Storage Table.
PositionData表:
PartitionKey: ClientID + VehicleID
RowKey: GUID
Properties: ClientID, VehicleID, DriverID, Date, GPSPosition
每位客户每年每辆车最多可记录1,000,000个实体.每个客户可以拥有数千辆汽车.因此,我决定按ClientID
+ VehicleID
进行分区,以便具有较小的可管理分区.通过ClientID
和VehicleID
查询时,该操作执行迅速,因为我们将搜索范围缩小到一个分区.
Each vehicle will log up to 1,000,000 entities per year per client. Each client could have thousands of vehicles. So, I decided to partition by ClientID
+ VehicleID
so to have small, manageable partitions. When querying by ClientID
and VehicleID
, the operation performs quickly because we are narrowing the search down to one partition.
问题:
这里的问题是有时我只需要查询ClientID
和DriverID
.因为不可能执行部分PartitionKey比较,所以将需要扫描每个分区.这会降低性能.
The problem here is that sometimes I need to query on only ClientID
and DriverID
. Because it's not possible to perform partial PartitionKey comparisons, every single partition will need to be scanned. This will kill performance.
我无法同时具有所有ClientID
,VehicleID
和DriverID
的PartitionKey,因为查询只会在VehicleID
或DriverID
上进行查询,而不会同时在两者上进行.
I can't have a PartitionKey with all ClientID
, VehicleID
and DriverID
because queries will only ever query on VehicleID
OR DriverID
, never both.
解决方案1:
我考虑过在其他位置存储一个表示VehicleID和DriverID对的值,然后具有一个ClientID + VehicleDriverPairID
PartitionKey,但是这将导致成千上万个分区,并且在我的代码中,分区之间的数据合并很多
I considered having a value stored elsewhere which represented a VehicleID and DriverID pair, and then having a ClientID + VehicleDriverPairID
PartitionKey, but that would result in hundreds of thousands of partitions and there will be much unioning of data between partitions in my code.
解决方案2:
为Client + VehicleID
拥有一个分区,为Client + DriverID
拥有另一个分区.这意味着更新表的工作量是两倍(两次更新),但是两个查询都将很快.另外还会有冗余数据.
Have a partition for Client + VehicleID
and another partition for Client + DriverID
. This means that updating the table is twice as much work (two updates) but both queries will be fast. Also there will be redundant data.
这些解决方案中的任何一个听起来可行吗?其他解决方案?
Do any of these solutions sound viable? Other solutions?
推荐答案
您应该像解决方案2中一样复制记录.我建议在每个记录所在的分区中都保留一个副本,因此也应按VehiculeId进行分区,这将使更新所有副本变得更加容易,从载具开始并传播到其他副本.
You should duplicate the records, as in solution 2. And I suggest to keep a copy where each record is in it's own partition, so partitioned by VehiculeId as well, this will making updating all the copies easier, starting from vehicleid and propagating to the others.
存储数据确实很便宜,查询是皮塔饼,除非您事先正确地存储了它.所以我的建议是:重复!
Storing data is really cheap, querying is a pita unless you store it correctly up front. So my advice is: Duplicate!
这篇关于具有多个查询点的Azure存储表设计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!