问题描述
我正在尝试使用以下命令将数据帧写入python大熊猫中gzip压缩的csv中:
I am trying to write a dataframe to a gzipped csv in python pandas, using the following:
import pandas as pd
import datetime
import csv
import gzip
# Get data (with previous connection and script variables)
df = pd.read_sql_query(script, conn)
# Create today's date, to append to file
todaysdatestring = str(datetime.datetime.today().strftime('%Y%m%d'))
print todaysdatestring
# Create csv with gzip compression
df.to_csv('foo-%s.csv.gz' % todaysdatestring,
sep='|',
header=True,
index=False,
quoting=csv.QUOTE_ALL,
compression='gzip',
quotechar='"',
doublequote=True,
line_terminator='\n')
这只会创建一个名为"foo-YYYYMMDD.csv.gz"的csv,而不是实际的gzip存档.
This just creates a csv called 'foo-YYYYMMDD.csv.gz', not an actual gzip archive.
我也尝试添加以下内容:
I've also tried adding this:
#Turn to_csv statement into a variable
d = df.to_csv('foo-%s.csv.gz' % todaysdatestring,
sep='|',
header=True,
index=False,
quoting=csv.QUOTE_ALL,
compression='gzip',
quotechar='"',
doublequote=True,
line_terminator='\n')
# Write above variable to gzip
with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as output:
output.write(d)
哪个也失败了.有任何想法吗?
Which fails as well. Any ideas?
推荐答案
将df.to_csv()
与关键字参数compression='gzip'
一起使用应产生一个gzip存档.我使用与您相同的关键字参数对其进行了测试,并且可以正常工作.
Using df.to_csv()
with the keyword argument compression='gzip'
should produce a gzip archive. I tested it using same keyword arguments as you, and it worked.
您可能需要升级熊猫,因为gzip直到版本0.17.1才实现,但是尝试在以前的版本中使用它不会引发错误,而只是生成常规的csv.您可以通过查看pd.__version__
的输出来确定当前的熊猫版本.
You may need to upgrade pandas, as gzip was not implemented until version 0.17.1, but trying to use it on prior versions will not raise an error, and just produce a regular csv. You can determine your current version of pandas by looking at the output of pd.__version__
.
这篇关于将GZIP压缩应用于Python Pandas中的CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!