为什么 url 在浏览器中有效但不使用 requests get 方法

本文介绍了为什么 url 在浏览器中有效但不使用 requests get 方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在测试时，我刚刚发现，这个

while testing, I just discovered, that this

url = ' http://wi312.rockdizfile.com/d/uclf2kr7fp4r2ge47pcuihdpky2chcsjur5nrds2hx53f26qgxnrktew/Kimbra%20-%20Love%20in%20High%20Places.mp3'

在浏览器中工作并开始下载文件，但如果我尝试使用

works in browser and file download begins but if i try to fetch this file using

requests.get(url)

它给出了巨大的错误......

it gives massive error ...

任何线索为什么会发生这种情况?是否需要对其进行解码以使其正常工作?

any clue why is this happening ? do in need to decode this to make it working?

更新这是我不断收到的错误:

Updatethis is the error I keep getting:

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "python/file_download.py", line 98, in _downloadChunk
    stream=True)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/sessions.py", line 382, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/sessions.py", line 485, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/adapters.py", line 381, in send
    raise Timeout(e)
Timeout: (<requests.packages.urllib3.connectionpool.HTTPConnectionPool object at 0x10258de90>, 'Connection to wi312.rockdizfile.com timed out. (connect timeout=0.001)')

我发布时没有空格，它只是换行，因为我发布了内联代码嵌入.

there was no space when I posted, it was just in newline because I posted inline code embed.

这是发出请求的代码:(也尝试新的 URL:http://archive.org/download/LucyIsabelleMarsh/LucyIsabelleMarsh-ItalianStreetSong.mp3)

Here is the code that makes requests:(also try new URL: http://archive.org/download/LucyIsabelleMarsh/LucyIsabelleMarsh-ItalianStreetSong.mp3)

import requests
import signal
import sys
import time
import threading
import utils as _fdUtils
from socket import error as SocketError, timeout as SocketTimeout

def _downloadChunk(url, idx, irange, fileName, sizeInBytes):
    _log.debug("Downloading %s for first chunk %s " % (irange, idx+1))
    pulledSize = irange[-1]
    try:
        resp = requests.get(url, allow_redirects=False,  timeout=0.001,
                            headers={'Range': 'bytes=%s-%s' % (str(irange[0]), str(irange[-1]))},
                            stream=True)
    except (SocketTimeout, requests.exceptions), e:
        _log.error(e)
        return


    chunk_size = str(irange[-1])
    for chunk in resp.iter_content(chunk_size):
        status = r"%10d  [%3.2f%%]" % (pulledSize, pulledSize * 100. / int(chunk_size))
        status = status + chr(8)*(len(status)+1)
        sys.stdout.write('%s\r' % status)
        sys.stdout.flush()
        pulledSize += len(chunk)
        dataDict[idx] = chunk
        time.sleep(.03)
        if pulledSize == sizeInBytes:
            _log.info("%s downloaded %3.0f%%", fileName, pulledSize * 100. / sizeInBytes)

class ThreadedFetch(threading.Thread):
    """ docstring for ThreadedFetch
    """
    def __init__(self, saveTo, queue):
        super(ThreadedFetch, self).__init__()
        self.queue = queue
        self.__saveTo = saveTo

    def run(self):
        threadLimiter.acquire()
        try:
            items = self.queue.get()
            url = items[0]
            split = items[-1]
            fileName = _fdUtils.getFileName(url)

            # grab split chunks in separate thread.
            if split > 1:
                maxSplits.acquire()
                try:

                    sizeInBytes = _fdUtils.getUrlSizeInBytes(url)
                    if sizeInBytes:
                        byteRanges = _fdUtils.getRangeSegements(sizeInBytes, split)
                    else:
                        byteRanges = ['0-']
                    filePath = os.path.join(self.__saveTo, fileName)

                    downloaders = [
                        threading.Thread(
                            target=_downloadChunk,
                            args=(url, idx, irange, fileName, sizeInBytes),
                        )
                        for idx, irange in enumerate(byteRanges)
                        ]

                    # start threads, let run in parallel, wait for all to finish
                    for th in downloaders:
                        th.start()

                    # this makes the wait for all thread to finish
                    # which confirms the dataDict is up-to-date
                    for th in downloaders:
                        th.join()
                    downloadedSize = 0
                    with open(filePath, 'wb') as fh:
                        for _idx, chunk in sorted(dataDict.iteritems()):
                            downloadedSize += len(chunk)
                            status = r"%10d  [%3.2f%%]" % (downloadedSize, downloadedSize * 100. / sizeInBytes)
                            status = status + chr(8)*(len(status)+1)
                            fh.write(chunk)
                            sys.stdout.write('%s\r' % status)
                            time.sleep(.04)
                            sys.stdout.flush()
                            if downloadedSize == sizeInBytes:
                                _log.info("%s, saved to %s", fileName, self.__saveTo)
                    self.queue.task_done()
                finally:
                    maxSplits.release()

timeout

为什么 url 在浏览器中有效但不使用 requests get 方法

问题描述

推荐答案