问题描述
摘要
我正在为 Anki (一个开源抽认卡程序)开发一系列附加组件. Anki附加组件以Python软件包的形式提供,其基本文件夹结构如下所示:
anki_addons/
addon_name_1/
__init__.py
addon_name_2/
__init__.py
基本应用程序将
anki_addons
附加到sys.path
,然后使用import <addon_name>
导入每个add_on.
我一直试图解决的问题是找到一种可靠的方式来使用我的附件运送软件包及其依赖项,同时又不污染全局状态或不依赖于对供应商软件包的手动编辑
具体
具体来说,给定这样的附加结构...
addon_name_1/
__init__.py
_vendor/
__init__.py
library1
library2
dependency_of_library2
...
...我希望能够导入_vendor
目录中包含的任何任意软件包,例如:
from ._vendor import library1
此类相对导入的主要困难在于,它们不适用于还依赖于通过绝对引用导入的其他软件包的软件包(例如,源代码library2
中的import dependency_of_library2
)
解决方案尝试
到目前为止,我已经探索了以下选项:
- 手动更新第三方程序包,以便它们的导入语句指向我的python程序包中的标准模块路径(例如
import addon_name_1._vendor.dependency_of_library2
).但这是一件繁琐的工作,无法扩展到较大的依赖树,也无法移植到其他程序包中. - 在我的程序包初始化文件中通过
sys.path.insert(1, <path_to_vendor_dir>)
将_vendor
添加到sys.path
.这可行,但是它对模块查找路径进行了全局更改,这将影响其他加载项,甚至影响基本应用程序本身.似乎是一种黑客行为,可能会在以后导致pandora出现一系列问题(例如,同一软件包的不同版本之间发生冲突等). - 为我的导入临时修改sys.path ;但这对于使用方法级导入的第三方模块不起作用.
- 根据我发现的示例编写 PEP302 样式的自定义导入器在 setuptools 中,但我只是没办法也不是它的尾巴.
我已经在这个问题上停留了好几个小时了,我开始认为我要么完全错过了一种简单的方法来执行此操作,要么我的整个方法存在根本上的错误. /p>
我是否可以通过代码附带第三方软件包的依赖树,而不必借助sys.path
黑客或修改有问题的软件包?
仅需澄清一下:我无法控制如何从anki_addons文件夹中导入加载项. anki_addons只是基本应用程序提供的目录,所有附加组件均安装在该目录中.它被添加到sys路径中,因此其中的附加程序包的行为几乎与位于Python模块查找路径中的任何其他python程序包一样.
首先,我建议不要出售;一些主要软件包以前曾使用过供应商,但为了避免不得不处理供应商的痛苦,已经放弃了.这样的示例之一就是 requests
库.如果您依靠使用pip install
的人员来安装软件包,那么只需使用依赖项并向人们介绍虚拟环境.不必假设您需要承担使依赖关系混乱的负担,也不必阻止人们在全局Python site-packages
位置中安装依赖项.
同时,我知道第三方工具的插件环境有所不同,并且如果对该工具使用的Python安装添加依赖项很麻烦或不可能进行商贩销售,则是一个可行的选择.我看到Anki在没有setuptools支持的情况下将扩展名分发为.zip
文件,因此肯定是这样的环境.
因此,如果您选择供应商依赖性,请使用脚本来管理依赖性并更新其导入.这是您的选择#1,但自动.
这是pip
项目选择的路径,请参见其 tasks
子目录以实现自动化,该子目录建立在 invoke
库上.请参见pip项目供应自述文件政策和基本原理(其中一个原因是pip
需要自行进行 bootstrap 操作,例如,使其依赖项可用于安装任何东西).
您不应使用任何其他选项;您已经列举了#2和#3的问题.
使用自定义导入程序的选项#4的问题在于,您仍然需要重写导入.换句话说,setuptools
使用的自定义导入程序钩子根本无法解决供应商名称空间的问题,相反,如果缺少供应商软件包,则可以动态导入顶级软件包(此问题是 pip
通过手动去捆绑过程). setuptools
实际上使用选项#1,在那里他们重写供应商软件包的源代码.例如,参见 <setuptools
供应商子包中的c22>项目; setuptools.extern
名称空间由自定义导入挂钩处理,如果从供应商化的软件包导入失败,则它将重定向到setuptools._vendor
或顶级名称.
pip
自动更新供应商的软件包的步骤如下:
- 删除
_vendor/
子目录中的所有内容,但文档,__init__.py
文件和需求文本文件除外. - 使用
pip
使用名为vendor.txt
的专用需求文件将所有供应商的依赖项安装到该目录中,避免编译.pyc
字节缓存文件并忽略瞬时依赖项(假定这些已经在vendor.txt
中列出了) );使用的命令是pip install -t pip/_vendor -r pip/_vendor/vendor.txt --no-compile --no-deps
. - 删除由
pip
安装但在供应商环境中不需要的所有内容,例如*.dist-info
,*.egg-info
,bin
目录以及已安装的依赖项中的一些内容,这些内容将永远不会使用. - 收集所有已安装目录,并添加扩展名为
.py
的文件(没有白名单中的任何内容);这是vendored_libs
列表. - 重写导入;这只是一系列正则表达式,其中
vendored_lists
中的每个名称都用import pip._vendor.<name>
替换import <name>
出现的内容,而用from pip._vendor.<name>(.*) import
替换每个from <name>(.*) import
出现的内容. - 应用一些补丁以清除所需的其余更改;从供应商的角度来看,只有
pip
requests
的补丁在这里很有趣,因为它为requests
库已删除的供应商软件包更新了requests
库的向后兼容性层;这个补丁是相当元的!
因此,从本质上讲,这是pip
方法最重要的部分,供应商程序包导入的重写非常简单;为了简化逻辑并删除pip
特定部分而解释,它只是以下过程:
import shutil
import subprocess
import re
from functools import partial
from itertools import chain
from pathlib import Path
WHITELIST = {'README.txt', '__init__.py', 'vendor.txt'}
def delete_all(*paths, whitelist=frozenset()):
for item in paths:
if item.is_dir():
shutil.rmtree(item, ignore_errors=True)
elif item.is_file() and item.name not in whitelist:
item.unlink()
def iter_subtree(path):
"""Recursively yield all files in a subtree, depth-first"""
if not path.is_dir():
if path.is_file():
yield path
return
for item in path.iterdir():
if item.is_dir():
yield from iter_subtree(item)
elif item.is_file():
yield item
def patch_vendor_imports(file, replacements):
text = file.read_text('utf8')
for replacement in replacements:
text = replacement(text)
file.write_text(text, 'utf8')
def find_vendored_libs(vendor_dir, whitelist):
vendored_libs = []
paths = []
for item in vendor_dir.iterdir():
if item.is_dir():
vendored_libs.append(item.name)
elif item.is_file() and item.name not in whitelist:
vendored_libs.append(item.stem) # without extension
else: # not a dir or a file not in the whilelist
continue
paths.append(item)
return vendored_libs, paths
def vendor(vendor_dir):
# target package is <parent>.<vendor_dir>; foo/_vendor -> foo._vendor
pkgname = f'{vendor_dir.parent.name}.{vendor_dir.name}'
# remove everything
delete_all(*vendor_dir.iterdir(), whitelist=WHITELIST)
# install with pip
subprocess.run([
'pip', 'install', '-t', str(vendor_dir),
'-r', str(vendor_dir / 'vendor.txt'),
'--no-compile', '--no-deps'
])
# delete stuff that's not needed
delete_all(
*vendor_dir.glob('*.dist-info'),
*vendor_dir.glob('*.egg-info'),
vendor_dir / 'bin')
vendored_libs, paths = find_vendored_libs(vendor_dir, WHITELIST)
replacements = []
for lib in vendored_libs:
replacements += (
partial( # import bar -> import foo._vendor.bar
re.compile(r'(^\s*)import {}\n'.format(lib), flags=re.M).sub,
r'\1from {} import {}\n'.format(pkgname, lib)
),
partial( # from bar -> from foo._vendor.bar
re.compile(r'(^\s*)from {}(\.|\s+)'.format(lib), flags=re.M).sub,
r'\1from {}.{}\2'.format(pkgname, lib)
),
)
for file in chain.from_iterable(map(iter_subtree, paths)):
patch_vendor_imports(file, replacements)
if __name__ == '__main__':
# this assumes this is a script in foo next to foo/_vendor
here = Path('__file__').resolve().parent
vendor_dir = here / 'foo' / '_vendor'
assert (vendor_dir / 'vendor.txt').exists(), '_vendor/vendor.txt file not found'
assert (vendor_dir / '__init__.py').exists(), '_vendor/__init__.py file not found'
vendor(vendor_dir)
Summary
I am working on a series of add-ons for Anki, an open-source flashcard program. Anki add-ons are shipped as Python packages, with the basic folder structure looking as follows:
anki_addons/
addon_name_1/
__init__.py
addon_name_2/
__init__.py
anki_addons
is appended to sys.path
by the base app, which then imports each add_on with import <addon_name>
.
The problem I have been trying to solve is to find a reliable way to ship packages and their dependencies with my add-ons while not polluting global state or falling back to manual edits of the vendored packages.
Specifics
Specifically, given an add-on structure like this...
addon_name_1/
__init__.py
_vendor/
__init__.py
library1
library2
dependency_of_library2
...
...I would like to be able to import any arbitrary package that is included in the _vendor
directory, e.g.:
from ._vendor import library1
The main difficulty with relative imports like this is that they do not work for packages that also depend on other packages imported through absolute references (e.g. import dependency_of_library2
in the source code of library2
)
Solution attempts
So far I have explored the following options:
- Manually updating the third-party packages, so that their import statements point to the fully qualified module path within my python package (e.g.
import addon_name_1._vendor.dependency_of_library2
). But this is tedious work that is not scalable to larger dependency trees and not portable to other packages. - Adding
_vendor
tosys.path
viasys.path.insert(1, <path_to_vendor_dir>)
in my package init file. This works, but it introduces a global change to the module look-up path which will affect other add-ons and even the base app itself. It just seems like a hack that could result in a pandora's box of issues later down the line (e.g. conflicts between different versions of the same package, etc.). - Temporarily modifying sys.path for my imports; but this fails to work for third-party modules with method-level imports.
- Writing a PEP302-style custom importer based off an example I found in setuptools, but I just couldn't make head nor tail of that.
I've been stuck on this for quite a few hours now and I'm beginning to think that I'm either completely missing an easy way to do this, or that there is something fundamentally wrong with my entire approach.
Is there no way I can ship a dependency tree of third-party packages with my code, without having to resort to sys.path
hacks or modifying the packages in question?
Edit:
Just to clarify: I don't have any control over how add-ons are imported from the anki_addons folder. anki_addons is just the directory provided by the base app where all add-ons are installed into. It is added to the sys path, so the add-on packages therein pretty much just behave like any other python package located in Python's module look-up paths.
First of all, I'd advice against vendoring; a few major packages did use vendoring before but have switched away to avoid the pain of having to handle vendoring. One such example is the requests
library. If you are relying on people using pip install
to install your package, then just use dependencies and tell people about virtual environments. Don't assume you need to shoulder the burden of keeping dependencies untangled or need to stop people from installing dependencies in the global Python site-packages
location.
At the same time, I appreciate that a plug-in environment of a third-party tool is something different, and if adding dependencies to the Python installation used by that tool is cumbersome or impossible vendorizing may be a viable option. I see that Anki distributes extensions as .zip
files without setuptools support, so that's certainly such an environment.
So if you choose to vendor dependencies, then use a script to manage your dependencies and update their imports. This is your option #1, but automated.
This is the path that the pip
project has chosen, see their tasks
subdirectory for their automation, which builds on the invoke
library. See the pip project vendoring README for their policy and rationale (chief among those is that pip
needs to bootstrap itself, e.g. have their dependencies available to be able to install anything).
You should not use any of the other options; you already enumerated the issues with #2 and #3.
The issue with option #4, using a custom importer, is that you still need to rewrite imports. Put differently, the custom importer hook used by setuptools
doesn't solve the vendorized namespace problem at all, it instead makes it possible to dynamically import top-level packages if the vendorized packages are missing (a problem that pip
solves with a manual debundling process). setuptools
actually uses option #1, where they rewrite the source code for vendorized packages. See for example these lines in the packaging
project in the setuptools
vendored subpackage; the setuptools.extern
namespace is handled by the custom import hook, which then redirects either to setuptools._vendor
or the top-level name if importing from the vendorized package fails.
The pip
automation to update vendored packages takes the following steps:
- Delete everything in the
_vendor/
subdirectory except the documentation, the__init__.py
file and the requirements text file. - Use
pip
to install all vendored dependencies into that directory, using a dedicated requirements file namedvendor.txt
, avoiding compilation of.pyc
bytecache files and ignoring transient dependencies (these are assumed to be listed invendor.txt
already); the command used ispip install -t pip/_vendor -r pip/_vendor/vendor.txt --no-compile --no-deps
. - Delete everything that was installed by
pip
but not needed in a vendored environment, i.e.*.dist-info
,*.egg-info
, thebin
directory, and a few things from installed dependencies thatpip
would never use. - Collect all installed directories and added files sans
.py
extension (so anything not in the whitelist); this is thevendored_libs
list. - Rewrite imports; this is simply a series of regexes, where every name in
vendored_lists
is used to replaceimport <name>
occurrences withimport pip._vendor.<name>
and everyfrom <name>(.*) import
occurrence withfrom pip._vendor.<name>(.*) import
. - Apply a few patches to mop up the remaining changes needed; from a vendoring perspective, only the
pip
patch forrequests
is interesting here in that it updates therequests
library backwards compatibility layer for the vendored packages that therequests
library had removed; this patch is quite meta!
So in essence, the most important part of the pip
approach, the rewriting of vendored package imports is quite simple; paraphrased to simplify the logic and removing the pip
specific parts, it is simply the following process:
import shutil
import subprocess
import re
from functools import partial
from itertools import chain
from pathlib import Path
WHITELIST = {'README.txt', '__init__.py', 'vendor.txt'}
def delete_all(*paths, whitelist=frozenset()):
for item in paths:
if item.is_dir():
shutil.rmtree(item, ignore_errors=True)
elif item.is_file() and item.name not in whitelist:
item.unlink()
def iter_subtree(path):
"""Recursively yield all files in a subtree, depth-first"""
if not path.is_dir():
if path.is_file():
yield path
return
for item in path.iterdir():
if item.is_dir():
yield from iter_subtree(item)
elif item.is_file():
yield item
def patch_vendor_imports(file, replacements):
text = file.read_text('utf8')
for replacement in replacements:
text = replacement(text)
file.write_text(text, 'utf8')
def find_vendored_libs(vendor_dir, whitelist):
vendored_libs = []
paths = []
for item in vendor_dir.iterdir():
if item.is_dir():
vendored_libs.append(item.name)
elif item.is_file() and item.name not in whitelist:
vendored_libs.append(item.stem) # without extension
else: # not a dir or a file not in the whilelist
continue
paths.append(item)
return vendored_libs, paths
def vendor(vendor_dir):
# target package is <parent>.<vendor_dir>; foo/_vendor -> foo._vendor
pkgname = f'{vendor_dir.parent.name}.{vendor_dir.name}'
# remove everything
delete_all(*vendor_dir.iterdir(), whitelist=WHITELIST)
# install with pip
subprocess.run([
'pip', 'install', '-t', str(vendor_dir),
'-r', str(vendor_dir / 'vendor.txt'),
'--no-compile', '--no-deps'
])
# delete stuff that's not needed
delete_all(
*vendor_dir.glob('*.dist-info'),
*vendor_dir.glob('*.egg-info'),
vendor_dir / 'bin')
vendored_libs, paths = find_vendored_libs(vendor_dir, WHITELIST)
replacements = []
for lib in vendored_libs:
replacements += (
partial( # import bar -> import foo._vendor.bar
re.compile(r'(^\s*)import {}\n'.format(lib), flags=re.M).sub,
r'\1from {} import {}\n'.format(pkgname, lib)
),
partial( # from bar -> from foo._vendor.bar
re.compile(r'(^\s*)from {}(\.|\s+)'.format(lib), flags=re.M).sub,
r'\1from {}.{}\2'.format(pkgname, lib)
),
)
for file in chain.from_iterable(map(iter_subtree, paths)):
patch_vendor_imports(file, replacements)
if __name__ == '__main__':
# this assumes this is a script in foo next to foo/_vendor
here = Path('__file__').resolve().parent
vendor_dir = here / 'foo' / '_vendor'
assert (vendor_dir / 'vendor.txt').exists(), '_vendor/vendor.txt file not found'
assert (vendor_dir / '__init__.py').exists(), '_vendor/__init__.py file not found'
vendor(vendor_dir)
这篇关于在不修改sys.path或第三方软件包的情况下,在Python软件包中导入供应商依赖性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!