本文介绍了将列表存储在键值数据库中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找在键值数据库(例如berkleydbleveldb)中存储与键关联的列表的最佳方法

例如:我有用户,用户与用户之间的订单我想存储每个用户的订单ID列表,以便通过范围选择(用于分页)快速访问

如何存储此结构?

我不想为每个用户以可序列化的格式存储它:

user_1_orders = serialize(1,2,3..)
user_2_orders = serialize(1,2,3..)

因为列表可能很长

我为每个以商店订单ID为键的用户考虑一个单独的db文件,但这不能解决范围选择问题.如果我想获得范围为[5000:5050]的用户id怎么办?

我了解redis,但是对诸如berkleydbleveldb的键值实现感兴趣.

解决方案

让我们从一个列表开始.您可以使用单个哈希图:

  1. 在第0行中存储用户订单的计数
  2. 对于每个新订单,存储一个新的行,其计数增加

所以yoru哈希图如下所示:

key | value
-------------
 0  |   5
 1  | tomato
 2  | celery
 3  | apple
 4  | pie
 5  | meat

键的稳定增量可确保每个键都是唯一的.鉴于db是键排序的,而pack函数将整数转换为一组正确排序的字节数组,您可以获取列表的切片.要获取介于5000和5050之间的订单,您可以使用bsddb Cursor.set_range或leveldb的 createReadStream(js api)

现在,让我们扩展到多个用户订单.如果您可以打开多个哈希图,则可以通过多个哈希图使用上面的哈希表.也许您会遇到一些系统问题(打开fds的最大容量或每个目录的最大文件数量).因此,您可以使用一个哈希表,并为多个用户共享相同的哈希图.

鉴于您pack使用字典顺序(字节顺序)正确键入的事实,因此我在以下解释中对leveldb和bsddb均适用.因此,我假设您具有pack函数.在bsddb中,您必须自己构建pack函数.看看 wiredtiger.packing 字节键以获取灵感.

原理是使用用户的ID为键命名空间.也称为密钥组成.

说您的数据库如下所示:

   key   |  value
-------------------
  1  | 0 |    2       <--- count column for user 1
  1  | 1 |  tomato
  1  | 2 |  orange
    ...      ...
  32 | 0 |    1       <--- count column for user 32
  32 | 1 |  banna
    ...  |   ...

您使用以下(伪)代码创建此数据库:

db.put(pack(1, make_uid(1)), 'tomato')
db.put(pack(1, make_uid(1)), 'orange')
...
db.put(pack(32, make_uid(32)), 'bannana')

make_uid实现看起来像这样:

def make_uid(user_uid):
    # retrieve the current count
    counter_key = pack(user_uid, 0)
    value = db.get(counter_key)
    value += 1  # increment
    # save new count
    db.put(counter_key, value)
    return value

然后,您必须进行正确的范围查找,这类似于单个组合键.使用bsddb api cursor.set_range(key)我们检索所有项目用户4250005050之间:

def user_orders_slice(user_id, start, end):
    key, value = cursor.set_range(pack(user_id, start))
    while True:
        user_id, order_id = unpack(key)
        if order_id > end:
            break
        else:
            # the value is probably packed somehow...
            yield value
            key, value = cursor.next()

不进行错误检查.如果从列表中删除项目,切成薄片user_orders_slice(42, 5000, 5050)的其他内容不保证会破坏51个项目.一种查询说50项的正确方法是实现一个user_orders_query(user_id,start,limit).

希望您能明白.

I search for best way to store lists associated with key in key value database (like berkleydb or leveldb)

For example:I have users and orders from user to userI want to store list of orders ids for each user to fast access with range selects (for pagination)

How to store this structure?

I don't want to store it in serializable format for each user:

user_1_orders = serialize(1,2,3..)
user_2_orders = serialize(1,2,3..)

beacuse list can be long

I think about separate db file for each user with store orders ids as keys in it, but this does not solve range selects problem.. What if I want to get user ids with range [5000:5050]?

I know about redis, but interest in key value implementation like berkleydb or leveldb.

解决方案

Let start with a single list. You can work with a single hashmap:

  1. store in row 0 the count of user's order
  2. for each new order store a new row with the count incremented

So yoru hashmap looks like the following:

key | value
-------------
 0  |   5
 1  | tomato
 2  | celery
 3  | apple
 4  | pie
 5  | meat

Steady increment of the key makes sure that every key is unique. Given the fact that the db is key ordered and that the pack function translates integers into a set of byte arrays that are correctly ordered you can fetch slices of the list. To fetch orders between 5000 and 5050 you can use bsddb Cursor.set_range or leveldb's createReadStream (js api)

Now let's expand to multiple user orders. If you can open several hashmap you can use the above using several hashmap. Maybe you will hit some system issues (max nb of open fds or max num of files per directory). So you can use a single and share the same hashmap for several users.

What I explain in the following works for both leveldb and bsddb given the fact that you pack keys correctly using the lexicographic order (byteorder). So I will assume that you have a pack function. In bsddb you have to build a pack function yourself. Have a look at wiredtiger.packing or bytekey for inspiration.

The principle is to namespace the keys using the user's id. It's also called key composition.

Say you database looks like the following:

   key   |  value
-------------------
  1  | 0 |    2       <--- count column for user 1
  1  | 1 |  tomato
  1  | 2 |  orange
    ...      ...
  32 | 0 |    1       <--- count column for user 32
  32 | 1 |  banna
    ...  |   ...

You create this database with the following (pseudo) code:

db.put(pack(1, make_uid(1)), 'tomato')
db.put(pack(1, make_uid(1)), 'orange')
...
db.put(pack(32, make_uid(32)), 'bannana')

make_uid implementation looks like this:

def make_uid(user_uid):
    # retrieve the current count
    counter_key = pack(user_uid, 0)
    value = db.get(counter_key)
    value += 1  # increment
    # save new count
    db.put(counter_key, value)
    return value

Then you have to do the correct range lookup, it's similar to the single composite-key. Using bsddb api cursor.set_range(key) we retrieve all itemsbetween 5000 and 5050 for user 42:

def user_orders_slice(user_id, start, end):
    key, value = cursor.set_range(pack(user_id, start))
    while True:
        user_id, order_id = unpack(key)
        if order_id > end:
            break
        else:
            # the value is probably packed somehow...
            yield value
            key, value = cursor.next()

Not error checks are done. Among other things slicing user_orders_slice(42, 5000, 5050) is not guaranteed to tore 51 items if you delete items from the list. A correct way to query say 50 items, is to implement a user_orders_query(user_id, start, limit)`.

I hope you get the idea.

这篇关于将列表存储在键值数据库中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 06:00