问题描述
我试图从git存储库中提取(源代码行,作者标签)对。最简单的方法是使用git blame。问题是,无论提交者是缩进代码还是真的改变代码,git blame都会将最后一个提交者作为作者。你知道更好的方法吗?
或者在尝试解决问题之前,我应该首先检查多个源代码行与多个作者相关联。如果百分比很小,则无需担心。但我发现,即使计算数字也很困难。对于单亲父母的提交,我们如何知道提交改变了一行而删除了一行并添加了一行?对于两个父母的提交(如合并),我应该如何结合两个分支的差异结果?
谢谢
概述
这是对Git工作原理的基本误解。 Git不提供补丁或差异;它承诺树木和blob,虽然当然做了某种解构。大部分提交历史记录都是在运行时计算出来的,并带有一些diff差异。换句话说,如果你的diff工具可以做什么你可以,Git也可以。
git-blame
git-blame命令不会执行你需要,因为手册页说(强调我的):
换句话说,它是严格面向行的。
git-log
你可以用git-log来接近你想要的。例如:
#显示不区分空白变化(例如缩进)的差异。
git log --patch --ignore-space-change
#完全忽略空格。
git log --patch --ignore-all-space
#使用[ - - ]显示删除项,并使用{+ +}添加。
git log --patch --word-diff = plain
#自定义diff格式,其中〜代表换行符。
git log --patch --word-diff =瓷器
用于文本处理,但从视觉角度来看,这非常不直观。但是,为了您的编程乐趣,在 man 1 git-diff
中有详细记录。
缺点是您必须从与每次提交相关的GIT_AUTHOR_NAME或GIT_COMMITTER_NAME获取作者信息,而不是让Git为您进行修饰。
I am trying to extract (source code line, author label) pair from git repositories. The easiest way to do that is using git blame. The problem is that git blame takes the last committer as the author no matter whether the committer just indents the code or really changes the code. Do you know any method to it better?
Or maybe before trying to solve the problem, I should first check how many source lines are associated with multiple authors. If the percentage is small, there is no need to worry about it. But I find even counting the number is difficult. For a commit with a single parent, how can we know that the commit changed a line rather deleted a line and added a lined? For a commit with two parents (like a merge), how should I combine the diff results from the two branch?
Thanks
Overview
This is a fundamental misunderstanding of how Git works. Git does not commit patches or diffs; it commits trees and blobs, although packfiles certainly do some sort of deltification. Most of the commit history is calculated at run-time with some flavor of diff.
In other words, if your diff tools can do what you want, so can Git.
git-blame
The git-blame command won't do what you want, because the man page says (emphasis mine):
In other words, it's strictly line-oriented.
git-log
You can get close to what you want with git-log. For example:
# Show diffs with indifference to whitespace changes (e.g. indenting).
git log --patch --ignore-space-change
# Just ignore whitespace altogether.
git log --patch --ignore-all-space
# Show deletions with [- -] and additions with {+ +}.
git log --patch --word-diff=plain
# Custom diff format where ~ denotes newlines.
git log --patch --word-diff=porcelain
The porcelain format is intended for text processing, but it's very non-intuitive from a visual point of view. However, it is well-documented in man 1 git-diff
for your programming pleasure.
The downside is that you will have to get your author information from the GIT_AUTHOR_NAME or GIT_COMMITTER_NAME associated with each commit, rather than having Git decorate it for you.
这篇关于从git存储库中提取作者信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!