问题描述
我上传了一个文件到Google电子表格(用数据创建一个可公开访问的示例IPython Notebook),我使用的是原生形式的文件,可以读入Pandas Dataframe。所以现在我使用下面的代码来阅读电子表格,工作正常,但只是以字符串形式出现,并且我没有任何运气试图将其返回到数据框中(您可以获取数据)。
I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
数据最终显示为:第一行标题)
The data ends up looking like: (1st row headers)
',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n
原生熊猫代码引入磁盘常驻文件看起来像:
The native pandas code that brings in the disk resident file looks like:
df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])
将有助于许多人提供一种简单的方法来共享Pandas使用的数据集!我尝试了一些替代品,但没有成功,我敢肯定我错过了一些明显的东西。
A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.
只是一个更新说明新的Google电子表格有一个不同的URL模式只需在上面的例子和下面的答案中使用这个来代替URL,你应该没问题,这里是一个例子:
Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:
https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id
从@Max Ghenis中查看下面的解决方案,它只使用了pd.read_csv,不需要StringIO或请求...
see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...
推荐答案
您可以在 StringIO
对象上使用 read_csv()
:
from StringIO import StringIO # got moved to io in python3.
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates=['Quradate'])
In [11]: df.head()
Out[11]:
City region Res_Comm \
0 Dothan South_Central-Montgomery-Auburn-Wiregrass-Dothan Residential
10 Foley South_Mobile-Baldwin Residential
12 Birmingham North_Central-Birmingham-Tuscaloosa-Anniston Commercial
38 Brent North_Central-Birmingham-Tuscaloosa-Anniston Residential
44 Athens North_Huntsville-Decatur-Florence Residential
mkt_type Quradate National_exp Alabama_exp Sales_exp \
0 Rural 2010-01-15 00:00:00 2 2 3
10 Suburban_Urban 2010-01-15 00:00:00 4 4 4
12 Suburban_Urban 2010-01-15 00:00:00 2 2 3
38 Rural 2010-01-15 00:00:00 3 3 3
44 Suburban_Urban 2010-01-15 00:00:00 4 5 4
Inventory_exp Price_exp Credit_exp
0 2 3 3
10 4 4 3
12 2 2 3
38 3 3 2
44 4 4 4
这篇关于将Google Spreadsheet CSV转换为 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!