python - Python 3:如果URL没有`//`，则`urllib.parse`中的`netloc`值为空。

我注意到如果URL没有netloc，则//是空的。
没有//，netloc为空

>>> from urllib.parse import urlparse
>>> urlparse('google.com')
ParseResult(scheme='', netloc='', path='google.com', params='', query='', fragment='')
>>>
>>> urlparse('www.google.com')
ParseResult(scheme='', netloc='', path='www.google.com', params='', query='', fragment='')
>>>
>>> urlparse('google.com/search?q=python')
ParseResult(scheme='', netloc='', path='google.com/search', params='', query='q=python', fragment='')
>>>

使用//，正确识别netloc

>>> urlparse('http://google.com')
ParseResult(scheme='http', netloc='google.com', path='', params='', query='', fragment='')
>>>
>>> urlparse('//google.com')
ParseResult(scheme='', netloc='google.com', path='', params='', query='', fragment='')
>>>
>>> urlparse('http://google.com/search?q=python')
ParseResult(scheme='http', netloc='google.com', path='/search', params='', query='q=python', fragment='')
>>>

即使URL中没有提供netloc，也可以正确识别//？

最佳答案

即使在URL中不提供//，也可以正确地标识netloc吗？
不是使用urlparse。这在documentation中有明确的解释：
遵循RFC 1808中的语法规范，urlparse只有在netloc正确引入时才能识别//。否则，输入被假定为相对URL，因此以路径组件开始。
如果您不想重写urlparse的逻辑（我不建议这样做），请确保url以//开头：

if not url.startswith('//'):
    url = '//' + url

编辑
正如亚历克西斯指出的，上述事实实际上是一个糟糕的解决方案。也许

if not (url.startswith('//') or url.startswith('http://') or url.startswith('https://')):
    url = '//' + url

但你的里程数可能也很有解决方案。如果您必须支持各种不一致的格式，您可能必须使用regex。