问题描述
这个问题也许没有简单的答案,但我想问一问,如果有人(如果不是简单的答案)至少有一个见识。
Maybe there's no simple answer to this question, but I ask in case someone has, if not a simple answer, at least an insight.
我有在很多情况下,我会创建一个循环,遍历数据库表中的许多记录并执行一些更新,并且可以在末尾合法地进行一个大提交,或者在处理它时提交每个记录。也就是说,一次提交一个不会造成任何数据完整性问题。
I've had a number of occasions where I create a loop that goes through many records in a database table performing some update, and where I could legitimately do one big commit at the end, or commit each record as I processed it. i.e. committing one at a time would not create any data integrity issues.
是否有明确的例子说明哪个更好?
Is there a clear case for which is better?
让我想到的是,我最近有一个这样的程序,因为它是一个运行时间很长的程序(大约80分钟),所以失败了一半。处理不良数据的方法。我解决了这个问题并重新运行,但是当我可以只处理以前未处理的记录时,它必须从头开始重新开始。
What brings it to mind is that I had one such program that I recently switched from a single big commit to a bunch of little commits because it was a fairly long running program -- about 80 minutes -- and it failed half way through on bad data. I fixed the problem and re-ran, but then it had to start over again from the beginning when I could have had it just process the previously unprocessed records.
进行此更改时,我注意到运行时间大致相同。
I noticed when I made this change that the run time was about the same either way.
推荐答案
假定不需要回滚整个持久性的能力(在这种情况下,只有一个答案;在外部提交),在循环内部提交可以使事务日志较小,但是需要更多往返D B。完全相反。哪个更快,取决于平均操作数和要整体提交的数据量。对于可保留约10-20条记录的例程,请在循环外提交。对于1m-2m的记录,我要分批提交。
Assuming that the ability to rollback the entire persistence is not needed (in which case there is only one answer; commit outside), committing inside the loop keeps the transaction log smaller, but requires more roundtrips to the DB. Committing outside the loop is the exact opposite. Which is faster depends on the average operation count and amount of data to be committed overall. For a routine that persists about 10-20 records, commit outside the loop. For 1m-2m records, I'd commit in batches.
这篇关于最好在循环内还是循环外提交?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!