什么是recomended方法来删除大量来自DynamoDB的项

什么是recomended方法来删除大量来自DynamoDB的项

本文介绍了什么是recomended方法来删除大量来自DynamoDB的项目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在DynamoDB写一个简单的日志服务。

我有一个由USER_ID散列和时间戳键入一个日志表(Unix纪元INT)的范围。

在该服务的用户终止他们的帐户,我需要删除所有项目表中,无论范围值。

什么是做这种操作的推荐方法(牢记可能有几百万的项目删除)?

我的选项,据我可以看到的是:

答:执行扫描操作,要求删除在每个返回的项目,直到没有项目被留

B:执行BatchGet操作,再次要求删除在每个项目上,直到没有留下

这两个看起来可怕给我,因为他们将采取一长串的时间。

我的理想想做的事就是打电话LogTable.DeleteItem(USER_ID) - 如果没有提供的范围,并把它删除了我的一切。

有什么想法?

感谢

解决方案

这是可以理解的要求确实;我能想象先进的操作,如这些可能会增加随着时间的推移由AWS团队(他们已经开始在有限的功能集第一的历史记录和评估基于客户反馈扩展),但这里是你应该做的,以避免成本全面扫描至少为:

  1. 使用Query而不是扫描来获取所有项目的 USER_ID - 这部作品无论在使用联合的混杂/范围的主键,因为 HashKeyValue RangeKeyCondition 的是这个API不同的参数和前者只瞄准的属性的复合主键的hash部件的价值。的。

    • 请注意,你要先处理查询API页面在这里像往常一样,看到的 ExclusiveStartKey 的参数:

    • 遍历所有返回的项目,要么有利于DeleteItem像往常一样

      • 更新:最有可能的BatchWriteItem是更适合于用例这样的(详见下文)。

更新

所强调的ivant,在BatchWriteItem操作的使您可以把或删除若干跨多个表的项目在一个单一的API调用[重点煤矿] 的:

请注意,这还是有一些相关的限制,最值得注意的是:

  • 在一个请求 最大操作 - 您可以指定一共有多达25个认沽或删除操作;但是,总的要求大小不能超过1 MB(在HTTP负载)。

  • 不是一个原子操作 - 在BatchWriteItem指定的个人操作都是原子;然而BatchWriteItem作为整体是一个尽力而为操作,而不是一个原子操作。也就是说,在一个BatchWriteItem要求,某些操作可能会成功,别人可能会失败。 [...]

不过这显然提供了一个潜在的显著增益使用情况下,像一个在眼前。

I'm writing a simple logging service in DynamoDB.

I have a logs table that is keyed by a user_id hash and a timestamp (Unix epoch int) range.

When a user of the service terminates their account, I need to delete all items in the table, regardless of the range value.

What is the recommended way of doing this sort of operation (Keeping in mind there could be millions of items to delete)?

My options, as far as I can see are:

A: Perform a Scan operation, calling delete on each returned item, until no items are left

B: Perform a BatchGet operation, again calling delete on each item until none are left

Both of these look terrible to me as they will take a looooong time.

What I ideally want to do is call LogTable.DeleteItem(user_id) - Without supplying the range, and have it delete everything for me.

Any thoughts?

Thanks

解决方案

An understandable request indeed; I can imagine advanced operations like these might get added over time by the AWS team (they have a history of starting with a limited feature set first and evaluate extensions based on customer feedback), but here is what you should do to avoid the cost of a full scan at least:

  1. Use Query rather than Scan to retrieve all items for user_id - this works regardless of the combined hash/range primary key in use, because HashKeyValue and RangeKeyCondition are separate parameters in this API and the former only targets the Attribute value of the hash component of the composite primary key..

    • Please note that you''ll have to deal with the query API paging here as usual, see the ExclusiveStartKey parameter:

    • Loop over all returned items and either facilitate DeleteItem as usual

      • Update: Most likely BatchWriteItem is more appropriate for a use case like this (see below for details).


Update

As highlighted by ivant, the BatchWriteItem operation enables you to put or delete several items across multiple tables in a single API call [emphasis mine]:

Please note that this still has some relevant limitations, most notably:

  • Maximum operations in a single request — You can specify a total of up to 25 put or delete operations; however, the total request size cannot exceed 1 MB (the HTTP payload).

  • Not an atomic operation — Individual operations specified in a BatchWriteItem are atomic; however BatchWriteItem as a whole is a "best-effort" operation and not an atomic operation. That is, in a BatchWriteItem request, some operations might succeed and others might fail. [...]

Nevertheless this obviously offers a potentially significant gain for use cases like the one at hand.

这篇关于什么是recomended方法来删除大量来自DynamoDB的项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 20:26