本文介绍了优化批量插入,SQLite的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用不同的缓冲区大小插入到本地SQLite DB中,并且发现当缓冲区大小为10,000时,插入10,000,000行数据需要将近8分钟。换句话说,它需要1000次写入来存储所有内容。

I am playing with different buffer sizes to be inserted into the local SQLite DB and have found that it takes nearly 8 minutes to inserts 10,000,000 rows of data, when buffer size is 10,000. In other words, it takes 1,000 writes to store everything.

存储10,000,000的8分钟似乎有点太长(或者是它?)

8 minutes to store 10,000,000 seems a bit too long (or is it?)

以下任何一项都可以进行优化以提高速度吗?请注意,插入的数据是随机的字符集。

Can any of the below be optimized to increase the speed? Please note that data being inserted is a random collection of characters.

public int flush() throws SQLException {
    String sql = "insert into datastore values(?,?,?,?);";

    PreparedStatement prep = con.prepareStatement(sql);

    for (DatastoreElement e : content) { // content is 10,000 elements long
        _KVPair kvp = e.getKvp();

        prep.setInt(1, e.getMetaHash());
        prep.setInt(2, kvp.hashCode());
        prep.setString(3, kvp.getKey());
        prep.setString(4, kvp.getValue());

        prep.addBatch();
    }

    int[] updateCounts = prep.executeBatch();

    con.commit();

    return errorsWhileInserting(updateCounts);
}

创建表格后,通过

    statement.executeUpdate("create table datastore
               (meta_hash INTEGER," +
               "kv_hash   INTEGER," +
               "key TEXT," +
               "value TEXT);");

上述任何一项都可以进一步优化吗?

Can any of the above be further optimized please?

推荐答案

我对Java API有点朦胧,但我认为你应该先启动一个事务,否则调用 commit()毫无意义。用 conn.setAutoCommit(false)来做。否则,SQLite将为每个插入/更新进行日志记录。这需要同步文件,这将导致缓慢。

I'm a bit hazy on the Java API but I think you should start a transaction first, otherwise calling commit() is pointless. Do it with conn.setAutoCommit(false). Otherwise SQLite will be journaling for each individual insert / update. Which requires syncing the file, which will contribute towards slowness.

编辑:提问者更新说这已经设置为真。在这种情况下:

The questioner updated to say that this is already set true. In that case:

这是很多数据。这段时间听起来不是这个世界。您可以做的最好的事情是使用不同的缓冲区大小进行测试。它们之间的缓冲抖动太小而虚拟内存在大尺寸的情况下会有平衡。因此,您不应该尝试将其全部放入一个缓冲区中。将插入物拆分成您自己的批次。

That is a lot of data. That length of time doesn't sound out of this world. The best you can do is to do tests with different buffer sizes. There is a balance between buffer jitter from them being too small and virtual memory kicking in for large sizes. For this reason, you shouldn't try to put it all into one buffer at once. Split up the inserts into your own batches.

这篇关于优化批量插入,SQLite的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 18:02