问题描述
我们正在清理一些代码.清理只是关于格式化(如果有问题,那么我们甚至可以假设行号没有改变,尽管理想情况下我也希望忽略行号的改变)
We're doing some code cleanup.The cleanup is only about formatting (if an issue, then let's even assume, that line numbers don't change, though ideally I'd like to ignore also line number changes)
为了确保没有意外的代码更改,我想找到一种简单/快速的方法来比较两个源代码.
In order to be sure, that there is no accidental code change I'd like to find a simple / fast way to compare the two source codes.
因此,假设我有 file1.py
和 file2.py
正在使用的是什么 py_compile.compile(filename)
创建.pyc文件,然后使用 uncompyle6 pycfile
,然后删除注释并比较结果,但这是矫over过正,而且非常缓慢.
what is working is to usepy_compile.compile(filename)
to create .pyc files and then useuncompyle6 pycfile
, then strip off comments and compare the results,But this is overkill and very slow.
我想象的另一种方法是复制将 file1.py
例如更改为 file.py
,使用 py_compile.compile("file.py")
并保存.pyc文件
Another approach I imagined is to copyfile1.py
for example to file.py
,use py_compile.compile("file.py")
and save the .pyc file
然后将例如 file2.py
复制到 file.py
并使用使用 py_compile.compile("file.py")
并保存.pyc文件最后比较两个生成的.pyc文件
then copy file2.py
for example to file.py
and useuse py_compile.compile("file.py")
and save the .pyc fileand finally compare both generated .pyc files
在所有(当前)版本> = python 3.6上都能可靠地工作吗
Would this work reliably with all (current) versions >= python 3.6
如果我至少记得python2,那么pyc文件可能包含时间戳或绝对路径,这可能会使比较失败.(至少如果pyc文件的生成是在两台不同的计算机上运行的)
If I remember at least for python2 the pyc files could contain time stamps or absolute paths, that could make the comparison fail. (at least if the generation of the pyc file was run on two different machines)
是否有一种比较简单的方法来比较py2文件的字节码?
Is there a clean way to compare the byte code of py2 files?
作为奖励功能(如果可能),我想为每个字节代码创建一个哈希,可以存储以供将来参考.
As bonus feature (if possible) I'd like to create a hash for each byte code, that I could store for future reference.
推荐答案
您可以尝试使用Python内部的 compile
函数,该函数可以从字符串进行编译(在您的情况下是从文件中读取).例如,从两个等效程序和一个几乎等效的程序编译并比较生成的代码对象,然后仅出于演示目的(您可能 不想做的事情)执行几个代码对象:
You might try using Python's internal compile
function, which can compile from string (read in from a file in your case). For example, compiling and comparing the resulting code objects from two equivalent programs and one almost equivalent program and then just for demo purposes (something you would not want to do) executing a couple of the code objects:
import hashlib
import marshal
def compute_hash(code):
code_bytes = marshal.dumps(code)
code_hash = hashlib.sha1(code_bytes).hexdigest()
return code_hash
source1 = """x = 3
y = 4
z = x * y
print(z)
"""
source2 = "x=3;y=4;z=x*y;print(z)"
source3 = "a=3;y=4;z=a*y;print(z)"
obj1 = compile(source=source1, filename='<string>', mode='exec', dont_inherit=1)
obj2 = compile(source=source2, filename='<string>', mode='exec', dont_inherit=1)
obj3 = compile(source=source3, filename='<string>', mode='exec', dont_inherit=1)
print(obj1 == obj2)
print(obj1 == obj3)
exec(obj1)
exec(obj3)
print(compute_hash(obj1))
打印:
True
False
12
12
48632a1b64357e9d09d19e765d3dc6863ee67ab9
这将使您不必复制py文件,创建pyc文件,比较pyc文件等.
注意:
compute_hash
函数是如果您需要可重复的哈希函数,即在连续的程序运行中计算出的相同代码对象重复返回相同的值.
The compute_hash
function is if you need a hash function that is repeatable, i.e. returns the same value repeatedly for the same code object when computed in successive program runs.
这篇关于比较两个python文件是否产生相同的字节码(代码明智地相同)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!