问题描述
希望下面的问题不要太长.但否则我无法通过问题和我想要的来解释:
学习如何使用importlib导入模块来自任意来源?(我昨天的问题)我为新文件类型 (.xxx) 编写了一个特定的加载程序.(实际上 xxx 是 pyc 的加密版本,以防止代码被盗).
我只想为新文件类型xxx"添加一个导入挂钩,而不会以任何方式影响其他类型(.py、.pyc、.pyd).
现在,加载器是 ModuleLoader
,继承自 mportlib.machinery.SourcelessFileLoader
.
使用 sys.path_hooks
加载器应该被添加为一个钩子:
myFinder = importlib.machinery.FileFinderloader_details = (ModuleLoader, ['.xxx'])sys.path_hooks.append(myFinder.path_hook(loader_details))
注意:这是通过调用modloader.activateLoader()
加载名为 test
(这是一个 test.xxx
)的模块后,我得到:
>>>导入模组加载器>>>modloader.activateLoader()>>>进口测试回溯(最近一次调用最后一次):文件<stdin>",第 1 行,位于 <module>导入错误:没有名为test"的模块>>>
但是,当我在添加钩子之前删除 sys.path_hooks
的内容时:
sys.path_hooks = []sys.path.insert(0, '.') # 当前目录sys.path_hooks.append(myFinder.path_hook(loader_details))
它有效:
>>>modloader.activateLoader()>>>进口测试使用 xxx 类在 xxxLoader exec_module在 xxxLoader get_code: .\test.xxx分析 ...生成代码对象...2 0 LOAD_CONST 03 LOAD_CONST 1 ('foo2')6 MAKE_FUNCTION 09 STORE_NAME 0 (foo2)12 LOAD_CONST 2(无)15 RETURN_VALUE>>>>>>测试<来自'.\\test.xxx'的模块'test'>在将文件内容转换为代码对象后正确导入模块.
但是我无法从包中加载相同的模块: import pack.test
注意:__init__.py
当然是pack目录下的一个空文件.
>>>导入包测试回溯(最近一次调用最后一次):文件",第 2218 行,在 _find_and_load_unlockedAttributeError: 'module' 对象没有属性 '__path__'在处理上述异常的过程中,又发生了一个异常:回溯(最近一次调用最后一次):文件<stdin>",第 1 行,位于 <module>导入错误:没有名为pack.test"的模块;包"不是包>>>
还不够,我不能再从那个包中加载普通的 *.py 模块了:我得到了和上面一样的错误:
>>>导入包.testpy回溯(最近一次调用最后一次):文件",第 2218 行,在 _find_and_load_unlockedAttributeError: 'module' 对象没有属性 '__path__'在处理上述异常的过程中,又发生了一个异常:回溯(最近一次调用最后一次):文件<stdin>",第 1 行,位于 <module>导入错误:没有名为pack.testpy"的模块;包"不是包>>>
据我所知,sys.path_hooks
被遍历,直到最后一个条目被尝试.那么为什么第一个变体(不删除sys.path_hooks
)不能识别新的扩展名xxx"而第二个变体(删除sys.path_hooks
)呢?当 sys.path_hooks
的条目无法识别xxx"时,看起来机器正在抛出异常而不是进一步遍历到下一个条目.
为什么第二个版本工作对于当前目录中的py、pyc和xxx模块,但不工作在pack
包中?我希望 py 和 pyc 在当前目录中甚至不起作用,因为 sys.path_hooks
只包含一个xxx"的钩子...
简短的回答是 sys.meta_path
中的默认 PathFinder 并不意味着在它已经支持的相同路径.但仍有希望!
快速分解
sys.path_hooks
由 importlib._bootstrap_external.PathFinder
类使用.
当导入发生时,sys.meta_path
中的每个条目都被要求为所请求的模块找到匹配的规范.特别是路径查找器将获取 sys.path
的内容并将其传递给 sys.path_hooks
中的工厂函数.每个工厂函数都有机会引发 ImportError (基本上工厂说不,我不支持此路径条目")或返回该路径的查找程序实例.然后将第一个成功返回的查找器缓存在 sys.path_importer_cache
中.从那时起,PathFinder 将只询问那些缓存的 finder 实例是否可以提供请求的模块.
如果您查看 sys.path_importer_cache
的内容,您会看到 sys.path
中的所有目录条目都已映射到 FileFinder 实例.非目录条目(zip 文件等)将映射到其他查找器.
因此,如果您将通过 FileFinder.path_hook
创建的新工厂附加到 sys.path_hooks
,则只有在先前的 FileFinder 钩子不接受时才会调用您的工厂路径.这不太可能,因为 FileFinder 可以处理任何现有目录.
或者,如果您将新工厂插入到现有工厂之前的 sys.path_hooks 中,则仅当您的新工厂不接受路径时才会使用默认挂钩.再说一次,由于 FileFinder 对它所接受的内容非常自由,这将导致只使用您的加载程序,正如您已经观察到的那样.
让它发挥作用
所以你可以尝试调整现有的工厂以支持你的文件扩展名和导入器(这很困难,因为导入器和扩展字符串元组被保存在一个闭包中),或者做我最终做的事情,即添加一个新的元路径查找器.
所以例如.来自我自己的项目,
导入系统from importlib.abc import FileLoader从 importlib.machinery 导入 FileFinder、PathFinder从 os 导入 getcwd从 os.path 导入 basename从 sibilant.module 导入 prep_module, exec_moduleSOURCE_SUFFIXES = [".lspy", ".sibilant"]_path_importer_cache = {}_path_hooks = []类 SibilantPathFinder(PathFinder):"""一个被覆盖的路径查找器,它将在系统路径.使用此模块中的存储来避免与原始路径查找器"""@类方法def invalidate_caches(cls):对于 _path_importer_cache.values() 中的查找器:如果 hasattr(finder, 'invalidate_caches'):finder.invalidate_caches()@类方法def _path_hooks(cls, 路径):对于 _path_hooks 中的钩子:尝试:返回钩子(路径)除了导入错误:继续别的:返回无@类方法def _path_importer_cache(cls, 路径):如果路径=='':尝试:路径 = getcwd()除了 FileNotFoundError:# 不要缓存失败,因为 cwd 很容易变成# 后面的有效目录.返回无尝试:finder = _path_importer_cache[路径]除了 KeyError:finder = cls._path_hooks(path)_path_importer_cache[path] = 查找器返回查找器类 SibilantSourceFileLoader(FileLoader):def create_module(self, spec):返回无def get_source(self, fullname):返回 self.get_data(self.get_filename(fullname)).decode("utf8")def exec_module(self, module):名称 = 模块.__名称__source = self.get_source(name)文件名 = basename(self.get_filename(name))prep_module(模块)exec_module(模块,源,文件名=文件名)def _get_lspy_file_loader():返回(SibilantSourceFileLoader,SOURCE_SUFFIXES)def _get_lspy_path_hook():返回 FileFinder.path_hook(_get_lspy_file_loader())定义_安装():完成 = 错误定义安装():非本地完成如果没有完成:_path_hooks.append(_get_lspy_path_hook())sys.meta_path.append(SibilantPathFinder)完成 = 真返回安装_install = _install()_安装()
SibilantPathFinder 覆盖 PathFinder 并仅替换那些引用 sys.path_hook
和 sys.path_importer_cache
的方法,并使用类似的实现,而不是查看 _path_hook
> 和 _path_importer_cache
是这个模块的本地.
在导入过程中,现有的路径查找器将尝试查找匹配的模块.如果不能,那么我注入的 SibilantPathFinder 将重新遍历 sys.path
并尝试找到与我自己的文件扩展名之一匹配的文件.
想清楚
我最终深入研究了 _bootstrap_external 模块的源代码https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py
_install
函数和 PathFinder.find_spec
方法是了解事情为何如此运作的最佳起点.
I hope the following question is not too long. But otherwise I cannot explain by problem and what I want:
Learned from How to use importlib to import modules from arbitrary sources? (my question of yesterday)I have written a specfic loader for a new file type (.xxx).(In fact the xxx is an encrypted version of a pyc to protect code from being stolen).
I would like just to add an import hook for the new file type "xxx" without affecting the other types (.py, .pyc, .pyd) in any way.
Now, the loader is ModuleLoader
, inheriting from mportlib.machinery.SourcelessFileLoader
.
Using sys.path_hooks
the loader shall be added as a hook:
myFinder = importlib.machinery.FileFinder
loader_details = (ModuleLoader, ['.xxx'])
sys.path_hooks.append(myFinder.path_hook(loader_details))
Note: This is activated once by calling modloader.activateLoader()
Upon loading a module named test
(which is a test.xxx
) I get:
>>> import modloader
>>> modloader.activateLoader()
>>> import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'test'
>>>
However, when I delete content of sys.path_hooks
before adding the hook:
sys.path_hooks = []
sys.path.insert(0, '.') # current directory
sys.path_hooks.append(myFinder.path_hook(loader_details))
it works:
>>> modloader.activateLoader()
>>> import test
using xxx class
in xxxLoader exec_module
in xxxLoader get_code: .\test.xxx
ANALYZING ...
GENERATE CODE OBJECT ...
2 0 LOAD_CONST 0
3 LOAD_CONST 1 ('foo2')
6 MAKE_FUNCTION 0
9 STORE_NAME 0 (foo2)
12 LOAD_CONST 2 (None)
15 RETURN_VALUE
>>>>>> test
<module 'test' from '.\\test.xxx'>
The module is imported correctly after conversion of the files content to a code object.
However I cannot load the same module from a package: import pack.test
Note: __init__.py
is of course as an empty file in pack directory.
>>> import pack.test
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.test'; 'pack' is not a package
>>>
Not enough, I cannot load plain *.py modules from that package anymore: I get the same error as above:
>>> import pack.testpy
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.testpy'; 'pack' is not a package
>>>
For my understanding sys.path_hooks
is traversed until the last entry is tried. So why is the first variant (without deleting sys.path_hooks
) not recognizing the new extension "xxx" and the second variant (deleting sys.path_hooks
) do?It looks like the machinery is throwing an exception rather than traversing further to the next entry, when an entry of sys.path_hooks
is not able to recognize "xxx".
And why is the second version working for py, pyc and xxx modules in the current directory, but not working in the package pack
? I would expect that py and pyc is not even working in the current dir, because sys.path_hooks
contains only a hook for "xxx"...
The short answer is that the default PathFinder in sys.meta_path
isn't meant to have new file extensions and importers added in the same paths it already supports. But there's still hope!
Quick Breakdown
sys.path_hooks
is consumed by the importlib._bootstrap_external.PathFinder
class.
When an import happens, each entry in sys.meta_path
is asked to find a matching spec for the requested module. The PathFinder in particular will then take the contents of sys.path
and pass it to the factory functions in sys.path_hooks
. Each factory function has a chance to either raise an ImportError (basically the factory saying "nope, I don't support this path entry") or return a finder instance for that path. The first successfully returned finder is then cached in sys.path_importer_cache
. From then on PathFinder will only ask those cached finder instances if they can provide the requested module.
If you look at the contents of sys.path_importer_cache
, you'll see all of the directory entries from sys.path
have been mapped to FileFinder instances. Non-directory entries (zip files, etc) will be mapped to other finders.
Thus, if you append a new factory created via FileFinder.path_hook
to sys.path_hooks
, your factory will only be invoked if the previous FileFinder hook didn't accept the path. This is unlikely, since FileFinder will work on any existing directory.
Alternatively, if you insert your new factory to sys.path_hooks ahead of the existing factories, the default hook will only be used if your new factory doesn't accept the path. And again, since FileFinder is so liberal with what it will accept, this would lead to only your loader being used, as you've already observed.
Making it Work
So you can either try to adjust that existing factory to also support your file extension and importer (which is difficult as the importers and extension string tuples are held in a closure), or do what I ended up doing, which is add a new meta path finder.
So eg. from my own project,
import sys
from importlib.abc import FileLoader
from importlib.machinery import FileFinder, PathFinder
from os import getcwd
from os.path import basename
from sibilant.module import prep_module, exec_module
SOURCE_SUFFIXES = [".lspy", ".sibilant"]
_path_importer_cache = {}
_path_hooks = []
class SibilantPathFinder(PathFinder):
"""
An overridden PathFinder which will hunt for sibilant files in
sys.path. Uses storage in this module to avoid conflicts with the
original PathFinder
"""
@classmethod
def invalidate_caches(cls):
for finder in _path_importer_cache.values():
if hasattr(finder, 'invalidate_caches'):
finder.invalidate_caches()
@classmethod
def _path_hooks(cls, path):
for hook in _path_hooks:
try:
return hook(path)
except ImportError:
continue
else:
return None
@classmethod
def _path_importer_cache(cls, path):
if path == '':
try:
path = getcwd()
except FileNotFoundError:
# Don't cache the failure as the cwd can easily change to
# a valid directory later on.
return None
try:
finder = _path_importer_cache[path]
except KeyError:
finder = cls._path_hooks(path)
_path_importer_cache[path] = finder
return finder
class SibilantSourceFileLoader(FileLoader):
def create_module(self, spec):
return None
def get_source(self, fullname):
return self.get_data(self.get_filename(fullname)).decode("utf8")
def exec_module(self, module):
name = module.__name__
source = self.get_source(name)
filename = basename(self.get_filename(name))
prep_module(module)
exec_module(module, source, filename=filename)
def _get_lspy_file_loader():
return (SibilantSourceFileLoader, SOURCE_SUFFIXES)
def _get_lspy_path_hook():
return FileFinder.path_hook(_get_lspy_file_loader())
def _install():
done = False
def install():
nonlocal done
if not done:
_path_hooks.append(_get_lspy_path_hook())
sys.meta_path.append(SibilantPathFinder)
done = True
return install
_install = _install()
_install()
The SibilantPathFinder overrides PathFinder and replaces only those methods which reference sys.path_hook
and sys.path_importer_cache
with similar implementations which instead look in a _path_hook
and _path_importer_cache
which are local to this module.
During import, the existing PathFinder will try to find a matching module. If it cannot, then my injected SibilantPathFinder will re-traverse the sys.path
and try to find a match with one of my own file extensions.
Figuring More Out
I ended up delving into the source for the _bootstrap_external modulehttps://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py
The _install
function and the PathFinder.find_spec
method are the best starting points to seeing why things work the way they do.
这篇关于如何使用 sys.path_hooks 自定义加载模块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!