问题描述
我试图从这样的几个存储库中提取git日志:
I am trying to extract git logs from a few repositories like this:
git log --pretty=format:%H\t%ae\t%an\t%at\t%s --numstat
对于较大的存储库(如rails/rails),需要花费35秒钟以上的时间才能生成日志.
For larger repositories (like rails/rails) it takes a solid 35+ seconds to generate the log.
有没有办法改善这种性能?
Is there a way to improve this performance?
推荐答案
您是正确的,它确实花费了20到35秒之间的时间来生成56,000次提交的报告,从而产生224,000行(15MiB)的输出.我实际上认为这是相当不错的表现,但您却没有;好吧.
You are correct, it does take somewhere between 20 and 35 seconds to generate the report on 56'000 commits generating 224'000 lines (15MiB) of output. I actually think that's pretty decent performance but you don't; okay.
由于您要从不变的数据库中使用恒定格式生成报告,因此只需执行一次.之后,您可以使用git log
的缓存结果并跳过耗时的生成.例如:
Because you are generating a report using a constant format from an unchanging database, you only have to do it once. Afterwards, you can use the cached result of git log
and skip the time-consuming generation. For example:
git log --pretty=format:%H\t%ae\t%an\t%at\t%s --numstat > log-pretty.txt
您可能想知道在整个报告中搜索感兴趣的数据需要花费多长时间.这是一个值得探讨的问题:
You might wonder how long it takes to search that entire report for data of interest. That's a worthy question:
$ tail -1 log-pretty.txt
30 0 railties/test/webrick_dispatcher_test.rb
$ time grep railties/test/webrick_dispatcher_test.rb log-pretty.txt
…
30 0 railties/test/webrick_dispatcher_test.rb
real 0m0.012s
…
不错,引入缓存"已将所需的时间从35秒以上减少到十几毫秒.快了将近3000倍.
Not bad, the introduction of a "cache" has reduced the time needed from 35+ seconds to a dozen milliseconds. That's almost 3000 times as fast.
这篇关于如何提高git log的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!