问题描述
我正在尝试从我的 Git 历史记录中删除密码等敏感数据.我不想删除整个文件,我只想用 removedSensitiveInfo
替换密码.这是我在浏览大量 StackOverflow 主题和其他网站后想到的.
I'm trying to remove sensitive data like passwords from my Git history. Instead of deleting whole files I just want to substitute the passwords with removedSensitiveInfo
. This is what I came up with after browsing through numerous StackOverflow topics and other sites.
git filter-branch --tree-filter "find . -type f -exec sed -Ei '' -e 's/(aSecretPassword1|aSecretPassword2|aSecretPassword3)/removedSensitiveInfo/g' {} \;"
当我运行此命令时,它似乎正在重写历史记录(它显示正在重写的提交并需要几分钟时间).但是,当我检查所有敏感数据是否确实已被删除时,结果发现它仍然存在.
When I run this command it seems to be rewriting the history (it shows the commits it's rewriting and takes a few minutes). However, when I check to see if all sensitive data has indeed been removed it turns out it's still there.
作为参考,这是我的检查方式
For reference this is how I do the check
git grep aSecretPassword1 $(git rev-list --all)
其中显示了与搜索查询匹配的所有数百个提交.什么都没有被替代.
Which shows me all the hundreds of commits that match the search query. Nothing has been substituted.
知道这里发生了什么吗?
Any idea what's going on here?
我仔细检查了我使用的正则表达式,它似乎是正确的.由于我的 Git 知识还很初级,我不确定还需要检查什么或如何正确调试它.例如,我不知道如何测试 1) 我的正则表达式是否不匹配任何内容,2) sed 未在所有文件上运行,3) 文件更改未保存,或 4) 其他内容.
I double checked the regular expression I'm using which seems to be correct. I'm not sure what else to check for or how to properly debug this as my Git knowledge quite rudimentary. For example I don't know how to test whether 1) my regular expression isn't matching anything, 2) sed isn't being run on all files, 3) the file changes are not being saved, or 4) something else.
非常感谢任何帮助.
附言我知道关于这个主题的几个 StackOverflow 线程.但是,我找不到关于在所有 (ASCII) 文件(而不是指定特定文件或文件类型)中替换单词(而不是删除文件)的内容.不确定这是否会有所作为,但所有建议的解决方案都不适用于我.
P.S.I'm aware of several StackOverflow threads about this topic. However, I couldn't find one that is about substituting words (rather than deleting files) in all (ASCII) files (rather than specifying a specific file or file type). Not sure whether that should make a difference, but all suggested solutions haven't worked for me.
推荐答案
git-filter-branch
是一个功能强大但难以使用的工具 - 有几个模糊的东西您需要知道如何正确使用它来完成您的任务,并且每一个都是您所看到的问题的可能原因.因此,与其立即尝试调试它们,不如让我们退后一步,看看最初的问题:
git-filter-branch
is a powerful but difficult to use tool - there are several obscure things you need to know to use it correctly for your task, and each one is a possible cause for the problems you're seeing. So rather than immediately trying to debug them, let's take a step back and look at the original problem:
- 替换所有文本文件中的给定字符串(即密码)(不指定特定文件/文件类型)
- 确保更新的 Git 历史记录不包含旧密码文本
- 尽可能简单地执行上述操作
针对这个问题有一个量身定制的解决方案:
There is a tailor-made solution to this problem:
BFG Repo-Cleaner 是 git 的更简单替代方案-filter-branch
专门设计用于从 Git 存储库历史记录中删除密码和其他不需要的数据.
The BFG Repo-Cleaner is a simpler alternative to git-filter-branch
specifically designed for removing passwords and other unwanted data from Git repository history.
BFG 在这种情况下帮助您的方式:
Ways in which the BFG helps you in this situation:
- BFG 是 10-720 倍 更快
- 它自动在所有标签和引用上运行,不同于
git-filter-branch
- 只有在你添加非凡的--tag-name-filter cat -- --all
命令行选项时才会这样做(注意,示例命令你在问题中给出的没有这个,这是您问题的可能原因) - BFG 不会生成任何
refs/original/
引用 - 因此您无需执行额外的步骤来删除它们 - 您可以将密码表示为简单的文字字符串,而不必担心获得正则表达式转义权.如果您确实需要,BFG 也可以处理正则表达式.
- The BFG is 10-720x faster
- It automatically runs on all tags and references, unlike
git-filter-branch
- which only does that if you add the extraordinary--tag-name-filter cat -- --all
command-line option (Note that the example command you gave in the Question DOES NOT have this, a possible cause of your problems) - The BFG doesn't generate any
refs/original/
refs - so no need for you to perform an extra step to remove them - You can express you passwords as simple literal strings, without having to worry about getting regex-escaping right. The BFG can handle regex too, if you really need it.
仔细遵循使用步骤 - 核心位只是这个命令:
Carefully follow the usage steps - the core bit is just this command:
$ java -jar bfg.jar --replace-text replacements.txt my-repo.git
replacements.txt
文件应包含您想要执行的所有替换,格式如下(每行一个条目 - 请注意不应包含注释):
The replacements.txt
file should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):
PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass # replace with 'examplePass' instead
PASSWORD3==> # replace with the empty string
regex:password=\w+==>password= # Replace, using a regex
您的整个存储库历史将被扫描,所有文本文件(小于 1MB)将执行替换:任何匹配的字符串(不在您的最新提交中)将被替换.
Your entire repository history will be scanned, and all text files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.
完全披露:我是 BFG Repo-Cleaner 的作者.
这篇关于如何替换 Git 历史记录中的单词 &正确调试相关问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!