问题描述
在WeasyPrint的公共API中,我接受HTML输入的文件名(除其他类型外).可以与内置open()
一起使用的任何文件名都可以使用,但是我需要将其转换为file://
方案中的URL,然后再将其传递给urllib.urlopen()
.
In WeasyPrint’s public API I accept filenames (among other types) for the HTML inputs. Any filename that works with the built-in open()
should work, but I need to convert it to an URL in the file://
scheme that will later be passed to urllib.urlopen()
.
(所有内容在内部都是URL形式.为了使用urlparse.urljoin()
解析相对URL引用,我需要文档的基本URL".)
(Everything is in URL form internally. I need to have a "base URL" for documents in order to resolve relative URL references with urlparse.urljoin()
.)
urllib.pathname2url 是一个开始:
重点是我的,但我确实需要完整的URL.到目前为止,这似乎可行:
The emphasis is mine, but I do need a complete URL. So far this seems to work:
def path2url(path):
"""Return file:// URL from a filename."""
path = os.path.abspath(path)
if isinstance(path, unicode):
path = path.encode('utf8')
return 'file:' + urlparse.pathname2url(path)
UTF-8似乎是 RFC 3987(IRI)推荐的.但是在这种情况下(URL最终将用于urllib)也许我应该使用 sys.getfilesystemencoding()?
UTF-8 seems to be recommended by RFC 3987 (IRI). But in this case (the URL is meant for urllib, eventually) maybe I should use sys.getfilesystemencoding()?
但是,根据文献,我不仅应该在file:
之前,而且应该在file://
之前... ...我不应该:在Windows上,nturl2path.pathname2url()
的结果已经以三个斜杠开头.
However, based on the literature I should prepend not just file:
but file://
... except when I should not: On Windows the results from nturl2path.pathname2url()
already start with three slashes.
所以问题是:有没有更好的方法来做到这一点并使它跨平台?
So the question is: is there a better way to do this and make it cross-platform?
推荐答案
出于完整性考虑,在Python 3.4+中,您应该执行以下操作:
For completeness, in Python 3.4+, you should do:
import pathlib
pathlib.Path(absolute_path_string).as_uri()
这篇关于将文件名转换为file://URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!