问题描述
我正在寻找一种简单的方法来保存来自Google文档页面的csv文件。该页面已发布,因此可通过直接链接访问(在下面的示例中进行了修改)。
我的所有浏览器都会提示我将csv文件另存为
$:$ b
获得相同的 csv_content code>库如下简单:>>导入请求
>>> csv_content = requests.get(DOC_URL).text
这一行更清楚地表达您的意图。它更容易写,更容易阅读。你自己 - 和任何其他分享你的代码库的人一样,只需使用请求。
I am looking for a simple way to save a csv file originating from a Google Docs page. The page is published, so it is accessible through a direct link (modified on purpose in the example below).
All my browsers will prompt me to save the csv file as soon as I launch the link.
Nor:
DOC_URL = 'https://docs.google.com/spreadsheet/ccc?key=0AoOWveO-dNo5dFNrWThhYmdYW9UT1lQQkE&output=csv' f = urllib.request.urlopen(DOC_URL) cont = f.read(SIZE) f.close() cont = str(cont, 'utf-8') print(cont), nor:
req = urllib.request.Request(DOC_URL) req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1284.0 Safari/537.13') f = urllib.request.urlopen(req) print(f.read().decode('utf-8'))print anything but html content.
(Tried the 2nd version after reading this other post: Download google docs public spreadsheet to csv with python .)
Any idea on what I am doing wrong? I am logged out of my Google account, if that worths to anything, but this works from any browser that I tried. As far as I understood, the Google Docs API is not yet ported on Python 3 and given the "toy" magnitude of my little project for personal use, it would not even make too much sense to use it from the get-go, if I can circumvent it.
In the 2nd attempt, I left the 'User-Agent', as I was thinking that maybe requests thought as coming from scripts (b/c no identification info is present) might be ignored, but it didn't make a difference.
解决方案Google responds to the initial request with a series of cookie-setting 302 redirects. If you don't store and resubmit the cookies between requests, it redirects you to the login page.
So, the problem is not with the User-Agent header, it's the fact that by default, urllib.request.urlopen doesn't store cookies, but it will follow the HTTP 302 redirects.
The following code works just fine on a public spreadsheet available at the location specified by DOC_URL:
>>> from http.cookiejar import CookieJar >>> from urllib.request import build_opener, HTTPCookieProcessor >>> opener = build_opener(HTTPCookieProcessor(CookieJar())) >>> resp = opener.open(DOC_URL) >>> # should really parse resp.getheader('content-type') for encoding. >>> csv_content = resp.read().decode('utf-8')Having shown you how to do it in vanilla python, I'll now say that the Right Way™ to go about this is to use the most-excellent requests library. It is extremely well documented and makes these sorts of tasks incredibly pleasant to complete.
For instance, to get the same csv_content as above using the requests library is as simple as:
>>> import requests >>> csv_content = requests.get(DOC_URL).textThat single line expresses your intent more clearly. It's easier to write and easier to read. Do yourself - and anyone else who shares your codebase - a favor and just use requests.
这篇关于Python 3 - 从Google文档保存csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!