问题描述
我需要将 30k 行数据从 CSV 文件导入 Vertica 数据库.我尝试使用的代码需要一个多小时才能完成.我想知道是否有更快的方法来做到这一点?我尝试使用 csv 导入,也尝试通过循环插入数据框进行导入,但速度不够快.事实上,它太慢了.你能帮我吗?
I'm going to need to import 30k rows of data from a CSV file into a Vertica database. The code I've tried with is taking more than an hour to do so. I'm wondering if there's a faster way to do it? I've tried to import using csv and also by looping through a dataframe to insert, but it just isn't fast enough. Infact, it's way too slow. Could you please help me?
rownum=df.shape[0]
for x in range(0,rownum):
a=df['AccountName'].values[x]
b=df['ID'].values[x]
ss="INSERT INTO Table (AccountName,ID) VALUES (%s,%s)"
val=(a,b)
cur.execute(ss,val)
connection.commit()
推荐答案
您想使用 COPY
命令 (COPY).
You want to use the COPY
command (COPY).
COPY Table FROM '/path/to/csv/file.csv' DELIMITER ',';
这比一次插入每一行要快得多.
This is much faster than inserting each row at a time.
由于您使用的是 python,我会推荐 vertica_python
模块,因为它在它的游标对象上有一个非常方便的复制方法 (vertica-python GitHub 页面).
Since you are using python, I would recommend the vertica_python
module as it has a very convenient copy method on it's cursor object (vertica-python GitHub page).
在vertica-python中使用COPY
的语法如下:
The syntax for using COPY
with vertica-python is as follows:
with open('file.csv', 'r') as file:
csv_file = file.read()
copy_cmd = "COPY Table FROM STDIN DELIMITER ','"
cur.copy(copy_cmd, csv_file)
connection.commit()
您可以做的另一件事是压缩 csv 文件以加快该过程.Vertica 可以读取 gzip、bzip 和 lzo 压缩文件.
Another thing you can do to speed up the process is compress the csv file. Vertica can read gzip, bzip and lzo compressed files.
with open('file.csv.gz', 'r') as file:
gzipped_csv_file = file.read()
copy_cmd = "COPY Table FROM STDIN GZIP DELIMITER ','"
cur.copy(copy_cmd, gzipped_csv_file)
connection.commit()
复制压缩文件将减少网络时间.因此,您必须确定压缩 csv 文件所需的额外时间是否已在复制压缩文件所节省的时间中弥补.在我处理过的大多数情况下,压缩文件是值得的.
Copying compressed files will reduce network time. So you have to determine if the extra time it takes to compress the csv file is made up for in the time saved copying the compressed files. In most cases I've dealt with, it is worth it to compress the file.
这篇关于使用 Python 将数据导入 SQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!