问题描述
我正在从R中将非常大的data.frame(3000万行)保存到PostgreSQL数据库,这杀死了我的PC。由于这是dplyr进行计算的结果,因此我希望使用此软件包的一些内置功能,但是copy_to不适用于如此大的表。有什么建议么?
I'm saving really large data.frame (30 million rows) to PostgreSQL database from R and it kills my PC. As this is a result of calculations produced by dplyr, I'd mind to use some build in functionality of this package, but copy_to doesn't work for such huge tables. Any suggestions?
推荐答案
能否将数据框复制到csv或制表符分隔的文本文件中,然后使用COPY FROM命令将其加载到PostgreSQL中[ 1]?
Can you copy the dataframe to a csv or tab delimited text file, then load that into PostgreSQL with the COPY FROM command [1]? That implements a bulk load approach which may perform faster.
在某些情况下,可能可以使用RScript将数据作为流发出并将其直接传递到psql:
In some cases, it may be possible to use an RScript to emit the data as a stream and pipe it directly into psql:
<RScript output tab delmited rows> | psql -c "COPY <tablename> (columnlist, ...) FROM STDIN WITH (FORMAT text)"
在某些长期运行的情况下,我将 | pv | 在中间以跟踪进度()。
In some long running cases, I put | pv | in the middle to track progress (http://www.ivarch.com/programs/pv.shtml).
[1]
这篇关于使用R将大型data.frame保存到PostgreSQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!