问题描述
我有一个大约 50,000 行和 12 列的表 (df
) 可写入 SAP HANA.我使用RJDBC
库,逐行写成如下:
I have a table (df
) of about 50,000 rows and 12 columns to write to SAP HANA.I use the RJDBC
library and write row by row as follows:
# Returns the sql statement to insert one row
build_insert_string <- function(db_output, row) {
row_string <- paste(row, collapse="','")
statement <- paste('INSERT INTO "', db_output$SCHEMA, '"."',db_output$table_name,'" (',db_output$string_of_columns,') VALUES (\'', row_string, '\');', sep='')
return(statement)
}
# Insert row by row
for(i in 1:nrow(df)){
tryCatch({ dbGetQuery(jdbcConnection, build_insert_string(db_output, df[i,])) }, error = function(e) {handle_db_errors(e)})
}
其中 db_output
是一个包含输出常量(架构、表和列)的列表.
where db_output
is a list containing the output constants (schema, table and columns).
目前,写表几乎需要半天时间.貌似HANA不支持批量插入比如:
Currently, it takes almost half of day to write the table.It seems that HANA does not support batch inserts such as:
INSERT INTO example
(example_id, name, value, other_value)
VALUES
(100, 'Name 1', 'Value 1', 'Other 1'),
(101, 'Name 2', 'Value 2', 'Other 2'),
(102, 'Name 3', 'Value 3', 'Other 3'),
(103, 'Name 4', 'Value 4', 'Other 4');
有没有人遇到过这个挑战,如果有,你有没有找到绕过它并提高写作效率的方法?
Did anyone encounter this challenge, and if so, did you find a way to circumvent it and improve the writing efficiency?
推荐答案
我会留在这里给后代:
虽然 dbGetQuery
是大表的干净解决方案 - 它执行查询,然后在每次插入后清除结果集,但速度也很慢.
While dbGetQuery
is the clean solution for large tables – it executes the query and then clears the resultset after each insertion, it is also slow.
显然,多个 INSERT
从 SQL 编辑器发送到 SAP HANA 时成功,但从 R 发送时则不成功.
Apparently, multiple INSERT
s into SAP HANA are successful when sent from the SQL editor but not when sent from R.
一个(真正)快速的解决方案将提供:
A (really) fast solution would be provided by:
dbWriteTable (
conn = jdbcConnection,
name= paste0(db_output$SCHEMA, ".",db_output$table_name),
value = df,
row.names = FALSE,
field.types = db_output$string_of_columns,
append=TRUE
)
然而,dbWriteTable()
不适用于大表(它会引发内存限制错误).可以通过修改Xmx
Java 选项来增加内存分配池来规避此限制,例如:options(java.parameters="- Xmx5000m")
.使用它有你自己的危险,特别是如果你的目标是自动化编写越来越大的表格.
However, dbWriteTable()
is not meant for large tables (it will throw a memory limit error). This limitation can be circumvented by increasing the memory allocation pool by modifying the Xmx
Java option, such as: options(java.parameters="- Xmx5000m")
. Use it at your own peril, especially if you aim to automate the writing of increasingly big tables.
我们探索的另一个潜在解决方案是将 R
输出导出为 .csv
(或多个 .csv
,以防超过100 万行),然后发送查询以将 .csv
导入到 SAP HANA
.大型 csv
导入到 SAP HANA 的速度非常快,但此解决方案需要一个额外的步骤(中间 .csv
输出)并且更容易出现不正确的数据导入.
Another potential solution we explored was to export the R
ouput as .csv
(or multiple .csv
s in case of a more than 1 million rows), and then send a query to import the .csv
s to SAP HANA
. Large csv
s get imported very fast to SAP HANA, but this solution entails an extra step (an intermediary .csv
output) and it is more prone to incorrect data importation.
这篇关于从 R 写表到 SAP HANA 的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!