问题描述
我一直在寻找一种简单 Java 算法来生成伪随机字母数字字符串.在我的情况下,它将用作唯一的会话/密钥标识符,可能"在 500K+
生成中是唯一的(我的需求实际上并不需要更复杂的东西).
I've been looking for a simple Java algorithm to generate a pseudo-random alpha-numeric string. In my situation it would be used as a unique session/key identifier that would "likely" be unique over 500K+
generation (my needs don't really require anything much more sophisticated).
理想情况下,我可以根据我的唯一性需求指定长度.例如,生成的长度为 12 的字符串可能类似于 "AEYGF7K0DM1X"
.
Ideally, I would be able to specify a length depending on my uniqueness needs. For example, a generated string of length 12 might look something like "AEYGF7K0DM1X"
.
推荐答案
算法
要生成随机字符串,请连接从一组可接受的符号中随机抽取的字符,直到字符串达到所需的长度.
Algorithm
To generate a random string, concatenate characters drawn randomly from the set of acceptable symbols until the string reaches the desired length.
这里有一些相当简单且非常灵活的代码,用于生成随机标识符.阅读后面的信息以了解重要的应用说明.
Here's some fairly simple and very flexible code for generating random identifiers. Read the information that follows for important application notes.
public class RandomString {
/**
* Generate a random string.
*/
public String nextString() {
for (int idx = 0; idx < buf.length; ++idx)
buf[idx] = symbols[random.nextInt(symbols.length)];
return new String(buf);
}
public static final String upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static final String lower = upper.toLowerCase(Locale.ROOT);
public static final String digits = "0123456789";
public static final String alphanum = upper + lower + digits;
private final Random random;
private final char[] symbols;
private final char[] buf;
public RandomString(int length, Random random, String symbols) {
if (length < 1) throw new IllegalArgumentException();
if (symbols.length() < 2) throw new IllegalArgumentException();
this.random = Objects.requireNonNull(random);
this.symbols = symbols.toCharArray();
this.buf = new char[length];
}
/**
* Create an alphanumeric string generator.
*/
public RandomString(int length, Random random) {
this(length, random, alphanum);
}
/**
* Create an alphanumeric strings from a secure generator.
*/
public RandomString(int length) {
this(length, new SecureRandom());
}
/**
* Create session identifiers.
*/
public RandomString() {
this(21);
}
}
使用示例
为 8 个字符的标识符创建一个不安全的生成器:
Usage examples
Create an insecure generator for 8-character identifiers:
RandomString gen = new RandomString(8, ThreadLocalRandom.current());
为会话标识符创建一个安全的生成器:
Create a secure generator for session identifiers:
RandomString session = new RandomString();
创建一个带有易于阅读的打印代码的生成器.字符串比完整的字母数字字符串长以补偿使用较少的符号:
Create a generator with easy-to-read codes for printing. The strings are longer than full alphanumeric strings to compensate for using fewer symbols:
String easy = RandomString.digits + "ACEFGHJKLMNPQRUVWXYabcdefhijkprstuvwx";
RandomString tickets = new RandomString(23, new SecureRandom(), easy);
用作会话标识符
生成可能是唯一的会话标识符还不够好,或者您可以只使用一个简单的计数器.当使用可预测的标识符时,攻击者会劫持会话.
Use as session identifiers
Generating session identifiers that are likely to be unique is not good enough, or you could just use a simple counter. Attackers hijack sessions when predictable identifiers are used.
长度和安全性之间存在紧张关系.较短的标识符更容易猜测,因为可能性较少.但是更长的标识符消耗更多的存储和带宽.更大的符号集会有所帮助,但如果标识符包含在 URL 中或手动重新输入,则可能会导致编码问题.
There is tension between length and security. Shorter identifiers are easier to guess, because there are fewer possibilities. But longer identifiers consume more storage and bandwidth. A larger set of symbols helps, but might cause encoding problems if identifiers are included in URLs or re-entered by hand.
会话标识符的随机性或熵的潜在来源应该来自为密码学设计的随机数生成器.然而,初始化这些生成器有时计算量很大或很慢,因此应尽可能努力重用它们.
The underlying source of randomness, or entropy, for session identifiers should come from a random number generator designed for cryptography. However, initializing these generators can sometimes be computationally expensive or slow, so effort should be made to re-use them when possible.
并非每个应用程序都需要安全性.随机分配可以是多个实体在共享空间中生成标识符的有效方式,无需任何协调或分区.协调可能很慢,尤其是在集群或分布式环境中,当实体最终共享太小或太大时,拆分空间会导致问题.
Not every application requires security. Random assignment can be an efficient way for multiple entities to generate identifiers in a shared space without any coordination or partitioning. Coordination can be slow, especially in a clustered or distributed environment, and splitting up a space causes problems when entities end up with shares that are too small or too big.
如果攻击者可能能够查看和操纵标识符,则在未采取措施使其不可预测的情况下生成的标识符应通过其他方式进行保护,就像在大多数 Web 应用程序中发生的那样.应该有一个单独的授权系统来保护其标识符可以在没有访问权限的情况下被攻击者猜到的对象.
Identifiers generated without taking measures to make them unpredictable should be protected by other means if an attacker might be able to view and manipulate them, as happens in most web applications. There should be a separate authorization system that protects objects whose identifier can be guessed by an attacker without access permission.
还必须注意使用足够长的标识符,以便在预期标识符总数的情况下不太可能发生冲突.这被称为生日悖论".碰撞概率, p,大约为 n/(2q),其中n是实际生成的标识符的数量,q是不同的数量字母表中的符号,x 是标识符的长度.这应该是一个非常小的数字,例如 2 或更少.
Care must be also be taken to use identifiers that are long enough to make collisions unlikely given the anticipated total number of identifiers. This is referred to as "the birthday paradox." The probability of a collision, p, is approximately n/(2q), where n is the number of identifiers actually generated, q is the number of distinct symbols in the alphabet, and x is the length of the identifiers. This should be a very small number, like 2 or less.
计算结果表明,500k 15 个字符的标识符之间发生冲突的几率约为 2,这可能比宇宙射线等未检测到的错误的可能性要小.
Working this out shows that the chance of collision among 500k 15-character identifiers is about 2, which is probably less likely than undetected errors from cosmic rays, etc.
根据他们的规范,UUID 并非设计为不可预测的,并且不应用作会话标识符.
According to their specification, UUIDs are not designed to be unpredictable, and should not be used as session identifiers.
标准格式的 UUID 占用大量空间:36 个字符仅占 122 位熵.(并非随机"UUID 的所有位都是随机选择的.)随机选择的字母数字字符串仅在 21 个字符中包含更多熵.
UUIDs in their standard format take a lot of space: 36 characters for only 122 bits of entropy. (Not all bits of a "random" UUID are selected randomly.) A randomly chosen alphanumeric string packs more entropy in just 21 characters.
UUID 不灵活;它们具有标准化的结构和布局.这是他们的主要美德,也是他们的主要弱点.与外部合作时,UUID 提供的标准化可能会有所帮助.对于纯粹的内部使用,它们可能效率低下.
UUIDs are not flexible; they have a standardized structure and layout. This is their chief virtue as well as their main weakness. When collaborating with an outside party, the standardization offered by UUIDs may be helpful. For purely internal use, they can be inefficient.
这篇关于如何生成随机字母数字字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!