问题描述
我正在通过遵循自动化无聊的东西来学习 Python.这个程序应该去http://xkcd.com/ 并下载所有图像以供离线查看.
I'm learning Python by following Automate the Boring Stuff. This program is supposed to go to http://xkcd.com/ and download all the images for offline viewing.
我使用的是 2.7 版和 Mac.
I'm on version 2.7 and Mac.
出于某种原因,我收到诸如未提供架构"之类的错误以及使用 request.get() 本身的错误.
For some reason, I'm getting errors like "No schema supplied" and errors with using request.get() itself.
这是我的代码:
# Saves the XKCD comic page for offline read
import requests, os, bs4, shutil
url = 'http://xkcd.com/'
if os.path.isdir('xkcd') == True: # If xkcd folder already exists
shutil.rmtree('xkcd') # delete it
else: # otherwise
os.makedirs('xkcd') # Creates xkcd foulder.
while not url.endswith('#'): # If there are no more posts, it url will endswith #, exist while loop
# Download the page
print 'Downloading %s page...' % url
res = requests.get(url) # Get the page
res.raise_for_status() # Check for errors
soup = bs4.BeautifulSoup(res.text) # Dowload the page
# Find the URL of the comic image
comicElem = soup.select('#comic img') # Any #comic img it finds will be saved as a list in comicElem
if comicElem == []: # if the list is empty
print 'Couldn\'t find the image!'
else:
comicUrl = comicElem[0].get('src') # Get the first index in comicElem (the image) and save to
# comicUrl
# Download the image
print 'Downloading the %s image...' % (comicUrl)
res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
res.raise_for_status() # Check for errors
# Save image to ./xkcd
imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(10000):
imageFile.write(chunk)
imageFile.close()
# Get the Prev btn's URL
prevLink = soup.select('a[rel="prev"]')[0]
# The Previous button is first <a rel="prev" href="/1535/" accesskey="p">< Prev</a>
url = 'http://xkcd.com/' + prevLink.get('href')
# adds /1535/ to http://xkcd.com/
print 'Done!'
以下是错误:
Traceback (most recent call last):
File "/Users/XKCD.py", line 30, in <module>
res = requests.get(comicUrl) # Get the image. Getting something will always use requests.get()
File "/Library/Python/2.7/site-packages/requests/api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "/Library/Python/2.7/site-packages/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 451, in request
prep = self.prepare_request(req)
File "/Library/Python/2.7/site-packages/requests/sessions.py", line 382, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/Library/Python/2.7/site-packages/requests/models.py", line 304, in prepare
self.prepare_url(url, params)
File "/Library/Python/2.7/site-packages/requests/models.py", line 362, in prepare_url
to_native_string(url, 'utf8')))
requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/the_martian.png': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/the_martian.png?
问题是我已经多次阅读书中有关该程序的部分,阅读请求文档,以及查看此处的其他问题.我的语法看起来正确.
The thing is I've been reading the section in the book about the program multiple times, reading the Requests doc, as well as looking at other questions on here. My syntax looks right.
感谢您的帮助!
这不起作用:
comicUrl = ("http:"+comicElem[0].get('src'))
我认为添加 http: 之前会消除没有提供架构的错误.
I thought adding the http: before would get rid of the no schema supplied error.
推荐答案
将您的 comicUrl
更改为此
comicUrl = comicElem[0].get('src').strip("http://")
comicUrl="http://"+comicUrl
if 'xkcd' not in comicUrl:
comicUrl=comicUrl[:7]+'xkcd.com/'+comicUrl[7:]
print "comic url",comicUrl
这篇关于使用 requests.get() 时未提供架构和其他错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!