有关python中字符串实例唯一性的问题

本文介绍了有关python中字符串实例唯一性的问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图找出python仅实例化一次的整数(似乎为-6到256)，并且在此过程中偶然发现某些字符串行为，我看不到模式.有时，以不同方式创建的相等字符串共享相同的ID，有时不共享.这段代码:

I was trying to figure out which integers python only instantiates once (-6 to 256 it seems), and in the process stumbled on some string behaviour I can't see the pattern in. Sometimes, equal strings created in different ways share the same id, sometimes not. This code:

A = "10000"
B = "10000"
C = "100" + "00"
D = "%i"%10000
E = str(10000)
F = str(10000)
G = str(100) + "00"
H = "0".join(("10","00"))

for obj in (A,B,C,D,E,F,G,H):
    print obj, id(obj), obj is A

打印:


10000 4959776 True
10000 4959776 True
10000 4959776 True
10000 4959776 True
10000 4959456 False
10000 4959488 False
10000 4959520 False
10000 4959680 False

我什至看不到该模式-除了前四个没有显式函数调用的事实-但肯定不能那样，因为例如C中的"+"表示对添加的函数调用.我特别不理解为什么C和G不同，因为这意味着加法的组成部分的ID比结果更重要.

I don't even see the pattern - save for the fact that the first four don't have an explicit function call - but surely that can't be it, since the "+" in C for example implies a function call to add. I especially don't understand why C and G are different, seeing as that implies that the ids of the components of the addition are more important than the outcome.

那么，A-D经受的特殊处理是什么呢?

So, what is the special treatment that A-D undergo, making them come out as the same instance?

推荐答案

在语言规范方面，对于任何不可变类型的实例，完全允许任何兼容的Python编译器和运行时创建新实例或查找现有实例等于所需值的相同类型的实例，并使用对该实例的新引用.这意味着在不可变项之间使用is或by-id比较总是不正确的，任何次要发行版都可能会对此进行调整或更改策略以增强优化.

In terms of language specification, any compliant Python compiler and runtime is fully allowed, for any instance of an immutable type, to make a new instance OR find an existing instance of the same type that's equal to the required value and use a new reference to that same instance. This means it's always incorrect to use is or by-id comparison among immutables, and any minor release may tweak or change strategy in this matter to enhance optimization.

在实现方面，权衡非常明确:尝试重用现有实例可能意味着花费(也许浪费了)尝试找到这样的实例的时间，但是如果尝试成功，那么将节省一些内存(以及分配时间，以后释放保存新实例所需的内存位.

In terms of implementations, the tradeoff are pretty clear: trying to reuse an existing instance may mean time spent (perhaps wasted) trying to find such an instance, but if the attempt succeeds then some memory is saved (as well as the time to allocate and later free the memory bits needed to hold a new instance).

如何解决这些实现折衷方案并不完全清楚-如果您可以确定表明可能找到合适的现有实例并且启发式搜索(即使失败)的快速搜索法，那么您可能想要尝试在启发式搜索提示时搜索并重用，否则跳过它.

How to solve those implementation tradeoffs is not entirely obvious -- if you can identify heuristics that indicate that finding a suitable existing instance is likely and the search (even if it fails) will be fast, then you may want to attempt the search-and-reuse when the heuristics suggest it, but skip it otherwise.

在您的观察中，您似乎发现了一种特殊的点释放实现，该实现在完全安全，快速且简单的情况下执行了一些窥视孔优化，因此分配A到D都可以归结为与A完全相同(但从E到F则不然，因为它们涉及到命名函数或方法，优化器的作者可能合理地认为命名函数或方法并非100％安全地假设语义-如果这样做的话，它的投资报酬率很低-因此，它们不是窥视孔洞-优化).

In your observations you seem to have found a particular dot-release implementation that performs a modicum of peephole optimization when that's entirely safe, fast, and simple, so the assignments A to D all boil down to exactly the same as A (but E to F don't, as they involve named functions or methods that the optimizer's authors may reasonably have considered not 100% safe to assume semantics for -- and low-ROI if that was done -- so they're not peephole-optimized).

因此，A到D重用相同的实例归结为A和B这样做(因为C和D被窥视孔优化为完全相同的构造).

Thus, A to D reusing the same instance boils down to A and B doing so (as C and D get peephole-optimized to exactly the same construct).

反过来，这种重用显然暗示了编译器策略/优化器试探法，即将同一函数的本地名称空间中不变类型的相同文字常量折叠为仅引用该函数的.func_code.co_consts中的一个实例(以使用当前CPython的术语) (针对函数和代码对象的属性)-合理的策略和启发式方法，因为在一个函数中重复使用相同的不变常量文字有些频繁，并且价格仅需支付一次(在编译时)，而优势却可以多次获得(每次该函数运行时，可能在循环等中).

That reuse, in turn, clearly suggests compiler tactics/optimizer heuristics whereby identical literal constants of an immutable type in the same function's local namespace are collapsed to references to just one instance in the function's .func_code.co_consts (to use current CPython's terminology for attributes of functions and code objects) -- reasonable tactics and heuristics, as reuse of the same immutable constant literal within one function are somewhat frequent, AND the price is only paid once (at compile time) while the advantage is accrued many times (every time the function runs, maybe within loops etc etc).

(碰巧的是，这些特定的策略和启发式的折衷方案在所有最近的CPython版本中都很普遍，而且我相信IronPython，Jython和PyPy也是如此；-).

(It so happens that these specific tactics and heuristics, given their clearly-positive tradeoffs, have been pervasive in all recent versions of CPython, and, I believe, IronPython, Jython, and PyPy as well;-).

如果您打算为Python本身或类似语言编写编译器，运行时环境，窥孔优化器等，这是一个值得研究的有趣话题.我猜想对内部结构进行深入研究(当然，理想情况下是许多不同的正确实现，以便不着眼于特定的怪癖——Python的好处是，目前至少有4种独立的值得生产的实现，更不用说了每个版本都有多个版本！)还可以间接地帮助一个更好的Python程序员-但特别要注意的是语言本身对保证的的内容，这要比您要讲的要少一些.可以在不同的实现中找到共同点，因为正好发生"的部分现在是共同点(语言规范并不需要要求)在下一点可能会完全改变发布一个或另一个实现，并且，如果您的生产代码错误地依赖于此类详细信息，则可能会导致令人讨厌的意外；-).另外-几乎不必依赖于这样的可变实现细节而不是依赖于语言规定的行为(除非您正在编写诸如优化器，调试器，分析器之类的代码;- ).

This is a somewhat worthy and interesting are of study if you're planning to write compilers, runtime environments, peephole optimizers, etc etc, for Python itself or similar languages. I guess that deep study of the internals (ideally of many different correct implementations, of course, so as not to fixate on the quirks of a specific one -- good thing Python currently enjoys at least 4 separate production-worthy implementations, not to mention several versions of each!) can also help, indirectly, make one a better Python programmer -- but it's particularly important to focus on what's guaranteed by the language itself, which is somewhat less than what you'll find in common among separate implementations, because the parts that "just happen" to be in common right now (without being required to be so by the language specs) may perfectly well change under you at the next point release of one or another implementation and, if your production code was mistakenly relying on such details, that might cause nasty surprises;-). Plus -- it's hardly ever necessary, or even particularly helpful, to rely on such variable implementation details rather than on language-mandated behavior (unless you're coding something like an optimizer, debugger, profiler, or the like, of course;-).

这篇关于有关python中字符串实例唯一性的问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！