问题描述
在我的应用程序中,我需要大大提高插入性能。示例:大约有21K条记录的文件需要100分钟以上的时间才能插入。有一些原因可能会花费一些时间,例如20分钟左右,但超过100分钟太长了。
In my application I need to massively improve insert performance. Example: A file with about 21K records takes over 100 min to insert. There are reasons it can takes some time, like 20 min or so but over 100 min is just too long.
数据插入到3个表中(多对多)。 ID是从序列中生成的,但是我已经用Google搜索并设置了 hibernate.id.new_generator_mappings = true
,并且分配大小+序列增量为1000。
Data is inserted into 3 tables (many-to-many). Id's are generated from a sequence but I have already googled and set hibernate.id.new_generator_mappings = true
and allocationSize + sequence increment to 1000.
数据量也不是什么特别的东西,文件是90 mb。
Also the amount of data is not anything extraordinary at all, the file is 90 mb.
我在大多数情况下都通过视觉虚拟机进行了验证用于jdbc驱动程序(postgresql)和休眠状态。我认为问题与子表中的唯一约束有关。服务层在插入之前进行手动检查(= SELECT)。如果该记录已经存在,则将其重用,而不是等待约束异常。
I have verified with visual vm that most of the time is spent in jdbc driver (postgresql) and hibernate. I think the issue is related to a unique constraint in the child table. The service layer makes a manual check (=SELECT) before inserting. If the record already exists, it reuses it instead of waiting for a constraint exception.
因此,对于特定文件进行汇总,每个表将有1个插入(会有所不同,但对于此文件则不是(理想(最快)情况)。这意味着总共60k插入+ 20k选择。仍然超过100分钟似乎很长(是的,硬件计数很高,它是在具有7200 rpm驱动器,没有ssd或RAID的简单PC上)。但是,这是对以前的应用程序(纯jdbc)的改进版本,在该应用程序上此硬件上的相同插入大约花费了15分钟。考虑到在这两种情况下,大约需要4-5分钟来进行预处理,因此增加的幅度很大。
So to sum it up for the specific file there will be 1 insert per table (could be different but not for this file which is the ideal (fastest) case). That means total 60k inserts + 20k selects. Still over 100 min seems very long (yeah hardware counts and it is on a simple PC with 7200 rpm drive, no ssd or raid). However this is an improved version over a previous application (plain jdbc) on which the same insert on this hardware took about 15 min. Considering that in both cases about 4-5 min is spent on "pre-processing" the increase is massive.
有谁提示可以对此进行改进?有批量加载功能吗?
Any tips who this could be improved? Is there any batch loading functionality?
推荐答案
请参见
每次添加 entityManager.flush()
和 entityManager.clear()
第n次调用save()方法。如果您使用休眠模式,请添加 hibernate.jdbc.batch_size = 100
,这似乎是一个合理的选择。
Add entityManager.flush()
and entityManager.clear()
after every n-th call to save() method. If you use hibernate add hibernate.jdbc.batch_size=100
which seems like a reasonable choice.
性能提高了> 10倍,可能接近100倍。
Performance increase was > 10x, probably close to 100x.
这篇关于大幅提高刀片性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!