问题描述
我想知道如何在给定输入字符串的情况下生成GUID,以使相同的输入字符串产生相同的GUID(有点像MD5哈希).MD5散列的问题在于它们仅保证低碰撞率,而不是唯一性.相反,我想要这样的东西:
I am wondering how to generate a GUID given an input string, such that the same input string results in the same GUID (sort of like an MD5 hash). The problem with MD5 hashes is they just guarantee low collision rate, rather than uniqueness. Instead I would like something like this:
guid('v1.0.0') == 1231231231123123123112312312311231231231
guid('v1.0.1') == 6154716581615471658161547165816154716581
guid('v1.0.2') == 1883939319188393931918839393191883939319
您将如何实现这种事情(理想情况下是在JavaScript中)?有可能做吗?我不确定从哪里开始.诸如 uuid模块之类的内容不会包含种子字符串,并且它们也不允许您使用自定义格式/字母
How would you go about implementing this sort of thing (ideally in JavaScript)? Is it even possible to do? I am not sure where to start. Things like the uuid module don't take a seed string, and they don't let you use a custom format/alphabet.
我不是在寻找规范的UUID格式,而是在寻找GUID,最好是一个仅由整数组成的.
I am not looking for the canonical UUID format, but rather a GUID, ideally one made up of just integers.
推荐答案
您需要的是定义文本字符串(例如"v1.0.0&")到40位长字符串的一对一映射(例如"123123 ...").这也称为 bijection ,尽管在您的情况下, injection (从输入到输出(不一定是到输入的简单一对一映射))就足够了.如您所述,哈希函数不一定能确保此映射,但是还有其他可能性,例如全周期 线性同余生成器 (如果它们带有种子,您可以将它们一对一映射到输入字符串值)或其他可逆函数.
What you would need is define a one-to-one mapping of text strings (such as "v1.0.0") onto 40 digit long strings (such as "123123..."). This is also known as a bijection, although in your case an injection (a simple one-to-one mapping from inputs to outputs, not necessarily onto) may be enough. As you note, hash functions don't necessarily ensure this mapping, but there are other possibilities, such as full-period linear congruential generators (if they take a seed that you can map one-to-one onto input string values), or other reversible functions.
但是,如果可能的输入字符串集大于可能的输出字符串集,则由于 鸽子洞原理 .
However, if the set of possible input strings is larger than the set of possible output strings, then you can't map all input strings one-to-one with all output strings (without creating duplicates), due to the pigeonhole principle.
例如,除非您以某种方式限制了120个字符的字符串的格式,否则通常无法将所有120个字符的字符串与所有40位数字的字符串一一对应.但是,如果您可以接受将输入字符串限制为不超过10 值(大约132位)的方法,或者您可以通过其他方式利用冗余,则可以解决创建40位输出字符串的问题.输入字符串,以确保将它们无损压缩到40个十进制数字(约132位)或更少,这是可能的或不可能的.另请参见此问题.
For example, you can't generally map all 120-character strings one-to-one with all 40-digit strings unless you restrict the format of the 120-character strings in some way. However, your problem of creating 40-digit output strings can be solved if you can accept limiting input strings to no more than 10 values (about 132 bits), or if you can otherwise exploit redundancy in the input strings so that they are guaranteed to compress losslessly to 40 decimal digits (about 132 bits) or less, which may or may not be possible. See also this question.
该算法包括两个步骤:
- 首先,通过构建字符串的
charCodeAt()
值,将字符串转换为BigInt
,类似于在另一个答案中给出的stringToInt
方法.如果任何charCodeAt()
为0x80或更大,或者生成的BigInt
等于或大于BigInt(alphabet_length)** BigInt(output_length,则抛出错误)
. - 然后,通过获取
BigInt
的mod和输出字母的大小并将每个剩余部分替换为输出字母中的相应字符,将整数转换为另一个字符串,直到BigInt
达到0.
- First, transform the string to a
BigInt
by building up the string'scharCodeAt()
values similarly to thestringToInt
method given in another answer. Throw an error if anycharCodeAt()
is 0x80 or greater, or if the resultingBigInt
is equal to or greater thanBigInt(alphabet_length)**BigInt(output_length)
. - Then, transform the integer to another string by taking the mod of the
BigInt
and the output alphabet's size and replacing each remainder with the corresponding character in the output alphabet, until theBigInt
reaches 0.
这篇关于如何使用自定义字母生成GUID,其行为类似于MD5哈希(在JavaScript中)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!