问题描述
我遇到了关于字典内存管理的这个问题,其中提到了 intern 功能.它到底有什么作用,什么时候会用到?
举个例子:如果我有一个名为 seen 的集合,它包含形式为 (string1,string2) 的元组,我用它来检查重复项,存储 (intern(string1),intern(string2)) 会提高性能wrt内存还是速度?
来自 Python 3 文档:
sys.intern(string)
在interned"字符串表中输入字符串并返回实习字符串——它是字符串本身或副本.实习字符串对于在字典查找中获得一点性能很有用 - 如果字典中的键是实习的,查找键是实习的,键比较(散列后)可以通过指针比较来完成而不是字符串比较.通常,Python 中使用的名称程序会自动驻留,并且字典用于保存模块、类或实例属性具有内部键.
实习字符串不是不朽的;你必须保留对返回 intern() 周围的值以从中受益.
澄清:
如文档所示,sys.intern
函数旨在用于性能优化.
sys.intern
函数维护一个 interned 字符串表.当您尝试对字符串进行实习时,该函数会在表中查找它并:
如果字符串不存在(还没有被interned),函数会保存它在表中并从内部字符串表中返回.
>>>导入系统>>>a = sys.intern('穿山甲为什么会梦到乳蛋饼')>>>一种'为什么穿山甲会梦到乳蛋饼'在上面的例子中,
a
保存了内部字符串.尽管不可见,sys.intern
函数已经将'为什么穿山甲会梦到乳蛋饼'
字符串对象保存在 interned 字符串表中.如果字符串存在(已被实习),则函数从内嵌字符串表.
>>>b = sys.intern('穿山甲为什么会梦到乳蛋饼')>>>乙'为什么穿山甲会梦到乳蛋饼'即使它不是立即可见的,因为字符串
>>>b 是一个真的'why do pangolins dream of quiche'
之前已经被实习过,b
现在拥有与相同的字符串对象代码>a
.如果我们不使用 intern 创建相同的字符串,我们最终会得到两个具有相同值的不同字符串对象.
>>>c = '为什么穿山甲会梦到乳蛋饼'>>>c是一个错误的>>>c 是 b错误的
通过使用 sys.intern
,您可以确保永远不会创建两个具有相同值的字符串对象——当您请求创建与现有字符串对象具有相同值的第二个字符串对象时,您收到对预先存在的字符串对象的引用.这样,您可以节省内存.此外,字符串对象比较现在非常有效,因为它是通过比较两个字符串对象的内存地址而不是它们的内容来执行的.
I came across this question about memory management of dictionaries, which mentions the intern function. What exactly does it do, and when would it be used?
To give an example: if I have a set called seen, that contains tuples in the form (string1,string2), which I use to check for duplicates, would storing (intern(string1),intern(string2)) improve performance w.r.t. memory or speed?
From the Python 3 documentation:
sys.intern(string)
Clarification:
As the documentation suggests, the sys.intern
function is intended to be used for performance optimization.
The sys.intern
function maintains a table of interned strings. When you attempt to intern a string, the function looks it up in the table and:
If the string does not exists (hasn't been interned yet) the function savesit in the table and returns it from the interned strings table.
>>> import sys >>> a = sys.intern('why do pangolins dream of quiche') >>> a 'why do pangolins dream of quiche'
In the above example,
a
holds the interned string. Even though it is not visible, thesys.intern
function has saved the'why do pangolins dream of quiche'
string object in the interned strings table.If the string exists (has been interned) the function returns it from theinterned strings table.
>>> b = sys.intern('why do pangolins dream of quiche') >>> b 'why do pangolins dream of quiche'
Even though it is not immediately visible, because the string
'why do pangolins dream of quiche'
has been interned before,b
holds now the same string object asa
.>>> b is a True
If we create the same string without using intern, we end up with two different string objects that have the same value.
>>> c = 'why do pangolins dream of quiche' >>> c is a False >>> c is b False
By using sys.intern
you ensure that you never create two string objects that have the same value—when you request the creation of a second string object with the same value as an existing string object, you receive a reference to the pre-existing string object. This way, you are saving memory. Also, string objects comparison is now very efficient because it is carried out by comparing the memory addresses of the two string objects instead of their content.
这篇关于sys.intern() 有什么作用,应该在什么时候使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!