python - Python:支持索引的内存中对象数据库？

我正在做一些数据处理，如果我可以将一堆字典粘贴到内存数据库中，然后对它运行简单查询，则要简单得多。

例如，类似:

people = db([
    {"name": "Joe", "age": 16},
    {"name": "Jane", "favourite_color": "red"},
])
over_16 = db.filter(age__gt=16)
with_favorite_colors = db.filter(favorite_color__exists=True)

但是，存在三个混淆因素:

有些值将是Python对象，并且将它们序列化是不可能的(太慢，会破坏身份)。当然，我可以解决此问题(例如，通过将所有项目存储在一个大列表中，然后在该列表中序列化它们的索引……，但这可能需要花些时间来摆弄)。

将有成千上万的数据，并且我将对它们进行大量的查找操作(例如图形遍历)，因此必须有可能执行有效的(即索引编制)查询。

在该示例中，数据是非结构化的，因此需要我预定义模式的系统将很棘手。

那么，这样的事情存在吗？还是我需要一起纠结？

最佳答案

通过sqlite3 standard library module使用内存中的SQLite数据库，对连接使用特殊值:memory:怎么办？如果您不想在SQL语句上编写代码，则始终可以使用SQLAlchemy之类的ORM访问内存中的SQLite数据库。

编辑:我注意到您说这些值可能是Python对象，并且您还需要避免序列化。需要将任意Python对象存储在数据库中也需要序列化。

如果您必须满足这两个要求，我可以提出一个实用的解决方案吗？为什么不只使用Python字典作为Python字典集合的索引？听起来您将需要建立每个索引的特殊需求；确定要查询的值，然后编写一个函数为每个值生成并建立索引。字典列表中一个键的可能值将是索引的键；索引的值将是字典列表。通过提供您要查找的值作为关键字来查询索引。

import collections
import itertools

def make_indices(dicts):
    color_index = collections.defaultdict(list)
    age_index = collections.defaultdict(list)
    for d in dicts:
        if 'favorite_color' in d:
            color_index[d['favorite_color']].append(d)
        if 'age' in d:
            age_index[d['age']].append(d)
    return color_index, age_index


def make_data_dicts():
    ...


data_dicts = make_data_dicts()
color_index, age_index = make_indices(data_dicts)
# Query for those with a favorite color is simply values
with_color_dicts = list(
        itertools.chain.from_iterable(color_index.values()))
# Query for people over 16
over_16 = list(
        itertools.chain.from_iterable(
            v for k, v in age_index.items() if age > 16)
)

关于python - Python:支持索引的内存中对象数据库？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/5161164/