python - 在这里使用永久性ID解决的 pickle 问题是什么？

从https://docs.python.org/3/library/pickle.html#persistence-of-external-objects

如果有人可以解释，我将不胜感激:在这里使用永久性ID解决的 pickle 问题是什么？换句话说，如果不使用持久性ID， pickle 会出现什么问题？

特别地，“对 pickle 数据流之外的对象的引用的概念”是什么意思？它是否与某些其他概念相反，例如“在 pickle 数据流中引用对象的概念”？

最佳答案

“棘手数据流”是“pickle.dump和pickle.load做什么”的通用描述。数据流例如是一个文件，可以依次从中读取数据。当所述流包含由 pickle 产生或消耗的数据时，它是 pickle 数据流。

pickle 流具有内部引用的概念-如果同一对象在流中多次出现，则它仅存储一次，然后被引用。但是，这仅指流中已存储的内容-引用不能指向流外部的对象，例如原始对象。 pickle 数据流的内容从概念上讲是其原始数据的副本。

import pickle

bar = (1, 2)
foo = {1: 1, 2: (1, 1), 'bar': bar}

with open('foo.pkl', 'wb') as out_stream:  # open a data stream...
     pickle.dump((bar, foo), out_stream)   # ...for pickle data

with open('foo.pkl', 'rb') as in_stream:
     bar2, foo2 = pickle.load(in_stream)

assert bar2 is foo2['bar']  # internal identity is preserved
assert bar is not bar2      # external identity is broken

持久性ID可以用于引用流中未包含的内容，例如原始对象或全局数据库句柄，或另一个流中的内容或类似内容。从概念上讲，永久性ID仅允许其他代码处理 pickle/pickle 。但是，持久性ID的定义和实现取决于要解决的问题。

定义和使用永久性ID并不困难。但是，这需要进行一些编排和簿记。一个非常简单的示例如下所示:

import io
import pickle

# some object to persist
# usually, one would have some store or bookkeeping in place
bar = (1, 2)

# the create/load implementation of the persistent id
def persistent_bar_id(obj):
    """Return a persistent id for the `bar` object only"""
    return "it's a bar" if obj is bar else None

def persistent_bar_load(pers_id):
    if pers_id == "it's a bar":
       return bar
    raise pickle.UnpicklingError(
        "This is just an example for one persistent object!")

stream = io.BytesIO()  # in-memory stream for demonstration
# need to extend Pickler and Unpickler with the persistent id handlers
pickler = pickle.Pickler(stream)
pickler.persistent_id = persistent_bar_id
unpickler = pickle.Unpickler(stream)
unpickler.persistent_load = persistent_bar_load

# we can now dump and load the persistent object
pickler.dump({'bar': bar})
stream.seek(0)
foo = unpickler.load()
assert foo['bar'] is bar  # persistent identity is preserved

作为一个现实示例，我的旧cpy2py module使用pickle在不同的解释器之间交换数据。对于常规的类似值的对象，这意味着在一个解释器中进行序列化，而在另一个解释器中进行反序列化。对于某些特殊的有状态对象，这意味着只能在所有连接的解释器之间交换唯一标识该对象的永久性ID。

其中涉及一些簿记，但是在这种情况下，您可以将persistent ID视为元组(process_id, object_id, object_type)。拥有的解释器可以使用此ID查找实际对象，而其他解释器可以代替地创建占位符对象。在这种情况下，最重要的是状态不会被存储和复制，而只是被引用。

关于python - 在这里使用永久性ID解决的 pickle 问题是什么？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/56414880/