问题描述
给定两个不同的字符串 S1 和 S2 (S1 != S2) 是否有可能:
Given two different strings S1 and S2 (S1 != S2) is it possible that:
SHA1(S1) == SHA1(S2)
是真的吗?
- 如果是 - 概率是多少?
- 如果没有 - 为什么不呢?
- 输入字符串的长度是否有上限,重复的概率为 0?或者 SHA1 的计算(因此重复的概率)与字符串的长度无关?
我试图实现的目标是散列一些敏感的 ID 字符串(可能与父 ID 等其他一些字段连接在一起),以便我可以使用散列值作为 ID(例如在数据库中).
The goal I am trying to achieve is to hash some sensitive ID string (possibly joined together with some other fields like parent ID), so that I can use the hash value as an ID instead (for example in the database).
示例:
Resource ID: X123
Parent ID: P123
我不想暴露我的资源标识的性质以允许客户端看到X123-P123".
I don't want to expose the nature of my resource identifies to allow client to see "X123-P123".
相反,我想创建一个新的列哈希(X123-P123"),假设它是 AAAZZZ.然后客户端可以请求 id 为 AAAZZZ 的资源而不知道我的内部 id 等.
Instead I want to create a new column hash("X123-P123"), let's say it's AAAZZZ. Then the client can request resource with id AAAZZZ and not know about my internal id's etc.
推荐答案
您描述的情况称为冲突.冲突必然存在,因为 SHA-1 接受更多不同的消息作为输入,它可以产生不同的输出(SHA-1 可以吃任何高达 2^64 位的位串,但只输出 160 位;因此,至少一个输出value 必须弹出几次).这个观察对于任何输出小于输入的函数都是有效的,无论该函数是否是一个好"的散列函数.
What you describe is called a collision. Collisions necessarily exist, since SHA-1 accepts many more distinct messages as input that it can produce distinct outputs (SHA-1 may eat any string of bits up to 2^64 bits, but outputs only 160 bits; thus, at least one output value must pop up several times). This observation is valid for any function with an output smaller than its input, regardless of whether the function is a "good" hash function or not.
假设 SHA-1 的行为就像一个随机预言机"(一个基本上返回随机值的概念对象,唯一的限制是一旦它在输入 mv/em>,此后它必须始终在输入 m 上返回 v),那么对于任何两个不同的字符串 S1 和 S2,碰撞的概率应该是 2^(-160).仍然在 SHA-1 行为像随机预言机的假设下,如果您收集了许多输入字符串,那么您将在收集了大约 2^80 个这样的字符串后开始观察冲突.
Assuming that SHA-1 behaves like a "random oracle" (a conceptual object which basically returns random values, with the sole restriction that once it has returned output v on input m, it must always thereafter return v on input m), then the probability of collision, for any two distinct strings S1 and S2, should be 2^(-160). Still under the assumption of SHA-1 behaving like a random oracle, if you collect many input strings, then you shall begin to observe collisions after having collected about 2^80 such strings.
(那是 2^80 而不是 2^160,因为使用 2^80 个字符串可以制作大约 2^159 对字符串.这通常被称为生日悖论",因为大多数人在应用于生日碰撞.有关该主题,请参阅维基百科页面.)
(That's 2^80 and not 2^160 because, with 2^80 strings you can make about 2^159 pairs of strings. This is often called the "birthday paradox" because it comes as a surprise to most people when applied to collisions on birthdays. See the Wikipedia page on the subject.)
现在我们强烈怀疑 SHA-1 确实不真正表现得像一个随机预言机,因为生日悖论方法是随机预言机的最佳碰撞搜索算法.然而,有一个公开的攻击应该在大约 2^63 步中找到碰撞,因此比生日悖论算法快 2^17 = 131072 倍.这种攻击不应该在真正的随机预言机上可行.请注意,这种攻击还没有真正完成,它仍然是理论上的(有些人尝试过但显然无法找到足够的 CPU 能力)(更新:截至 2017 年初,有人确实计算了一个 SHA-1 冲突 与上述方法,它完全按预期工作).然而,这个理论看起来很合理,而且 SHA-1 似乎真的不是一个随机预言.相应地,至于碰撞的概率,嗯,所有的赌注都关闭了.
Now we strongly suspect that SHA-1 does not really behave like a random oracle, because the birthday-paradox approach is the optimal collision searching algorithm for a random oracle. Yet there is a published attack which should find a collision in about 2^63 steps, hence 2^17 = 131072 times faster than the birthday-paradox algorithm. Such an attack should not be doable on a true random oracle. Mind you, this attack has not been actually completed, it remains theoretical (some people tried but apparently could not find enough CPU power)(Update: as of early 2017, somebody did compute a SHA-1 collision with the above-mentioned method, and it worked exactly as predicted). Yet, the theory looks sound and it really seems that SHA-1 is not a random oracle. Correspondingly, as for the probability of collision, well, all bets are off.
至于您的第三个问题:对于具有 n 位输出的函数,如果您可以输入超过 2^n 条不同的消息,则必然会发生冲突,即如果最大输入消息长度大于n.如果m 低于n,答案就没有那么简单了.如果函数表现得像一个随机预言机,那么存在碰撞的概率会随着 m 而降低,并且不是线性的,而是在 m=n/2.这与生日悖论的分析相同.使用 SHA-1,这意味着如果 m 那么很可能没有碰撞,而 m > 80 使得至少存在一次碰撞的可能性很大(m > 160 这变成了一个确定性).
As for your third question: for a function with a n-bit output, then there necessarily are collisions if you can input more than 2^n distinct messages, i.e. if the maximum input message length is greater than n. With a bound m lower than n, the answer is not as easy. If the function behaves as a random oracle, then the probability of the existence of a collision lowers with m, and not linearly, rather with a steep cutoff around m=n/2. This is the same analysis than the birthday paradox. With SHA-1, this means that if m < 80 then chances are that there is no collision, while m > 80 makes the existence of at least one collision very probable (with m > 160 this becomes a certainty).
请注意,存在碰撞"和您发现碰撞"是有区别的.即使碰撞必须存在,每次尝试时仍然有 2^(-160) 的概率.上一段的意思是,如果您不能(从概念上)尝试 2^160 对字符串,例如,这样的概率是毫无意义的.因为你限制自己使用少于 80 位的字符串.
Note that there is a difference between "there exists a collision" and "you find a collision". Even when a collision must exist, you still have your 2^(-160) probability every time you try. What the previous paragraph means is that such a probability is rather meaningless if you cannot (conceptually) try 2^160 pairs of strings, e.g. because you restrict yourself to strings of less than 80 bits.
这篇关于是否有可能获得相同的 SHA1 哈希?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!