这里要注意的另一件事:math/rand的软件包文档状态:因此,默认源比rand.NewSource()可能获得的Source慢,因为默认源必须在并发访问/使用下提供安全性,而rand.NewSource()不提供此安全性(因此Source返回的速度可能更快). 7.使用strings.Builder所有先前的解决方案均返回string,其内容首先构建在切片中( Genesis 中的[]rune,随后的解决方案中的[]byte),然后转换为string.最后的转换必须复制切片的内容,因为string值是不可变的,并且如果转换不能复制,则不能保证字符串的内容不会通过其原始切片进行修改.有关详细信息,请参阅如何将utf8字符串转换为[] byte? 和 golang:[] byte(string)vs [] byte(* string ). Go 1.10已引入strings.Builder. strings.Builder 一种新类型,可用于构建string的内容,类似于 bytes.Buffer .它在内部使用[]byte进行操作,完成后,我们可以使用其string值. rel ="noreferrer"> Builder.String() 方法.但是,最酷的是它可以执行此操作而无需执行我们上面刚刚谈到的复制操作.之所以敢于这样做,是因为未暴露用于构建字符串内容的字节片,因此可以确保没有人可以无意或恶意地修改它来更改生成的不可变"字符串.所以我们的下一个想法是不要在切片中构建随机字符串,而是借助strings.Builder,因此一旦完成,我们就可以获取并返回结果,而无需对其进行复制.这可能会在速度方面有所帮助,并且绝对会在内存使用和分配方面有所帮助.func RandStringBytesMaskImprSrcSB(n int) string { sb := strings.Builder{} sb.Grow(n) // A src.Int63() generates 63 random bits, enough for letterIdxMax characters! for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; { if remain == 0 { cache, remain = src.Int63(), letterIdxMax } if idx := int(cache & letterIdxMask); idx < len(letterBytes) { sb.WriteByte(letterBytes[idx]) i-- } cache >>= letterIdxBits remain-- } return sb.String()}请注意,在创建新的strings.Buidler之后,我们将其称为 Builder.Grow() 方法,确保它分配了足够大的内部切片(以避免在我们添加随机字母时重新分配). 8.带有软件包unsafe 的模仿" strings.Builder strings.Builder与内部一样,在内部[]byte中构建字符串.因此,基本上通过strings.Builder进行操作会产生一些开销,我们切换到strings.Builder的唯一目的是避免对切片进行最终复制. strings.Builder通过使用软件包 unsafe :避免了最终副本// String returns the accumulated string.func (b *Builder) String() string { return *(*string)(unsafe.Pointer(&b.buf))}问题是,我们也可以自己执行此操作.因此,这里的想法是切换回在[]byte中构建随机字符串,但完成后,请勿将其转换为string进行返回,而是进行不安全的转换:获取string指向我们的字节片作为字符串数据.这是可以完成的方法:func RandStringBytesMaskImprSrcUnsafe(n int) string { b := make([]byte, n) // A src.Int63() generates 63 random bits, enough for letterIdxMax characters! for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; { if remain == 0 { cache, remain = src.Int63(), letterIdxMax } if idx := int(cache & letterIdxMask); idx < len(letterBytes) { b[i] = letterBytes[idx] i-- } cache >>= letterIdxBits remain-- } return *(*string)(unsafe.Pointer(&b))}(9.使用rand.Read()) 添加了1.7版 a rand.Read() 函数和 Rand.Read() 方法.为了获得更好的性能,我们应该尝试使用它们一步读取所需的字节数.有一个小问题":我们需要多少个字节?我们可以说:与输出字母的数量一样多.我们认为这是一个较高的估计,因为字母索引使用的少于8位(1个字节).但是在这一点上,我们已经变得越来越糟(因为获得随机位是困难的部分"),而且我们得到的超出了需要.还请注意,要保持所有字母索引的均等分布,可能会有一些我们将无法使用的垃圾"随机数据,因此我们最终将跳过一些数据,因此当我们处理完这些数据时,结果会很短遍历所有字节片.我们将需要进一步递归"获得更多随机字节.现在我们甚至失去了对rand软件包的单次调用"的优势...我们可以某种程度上"优化从math.Rand()获取的随机数据的使用.我们可以估计我们需要多少个字节(位). 1个字母需要letterIdxBits位,而我们需要n个字母,因此我们需要n * letterIdxBits / 8.0个字节四舍五入.我们可以计算出随机索引不可用的可能性(请参见上文),因此我们可以要求更多的可能"足够(如果事实并非如此,则重复此过程).例如,我们可以将字节切片作为位流"进行处理,为此,我们拥有一个不错的第三方库: (披露:我是作者).但是基准代码仍然表明我们没有赢.为什么会这样?最后一个问题的答案是,因为rand.Read()使用循环并继续调用Source.Int63(),直到它填充了传递的切片为止.完全是RandStringBytesMaskImprSrc()解决方案的功能,没有中间缓冲区,并且没有增加复杂性.这就是RandStringBytesMaskImprSrc()仍然在王位上的原因.是的,与rand.Read()不同,RandStringBytesMaskImprSrc()使​​用的是未同步的rand.Source.但是推理仍然适用.如果我们使用Rand.Read()代替rand.Read()(前者也是不同步的),则证明了这一点. II.基准好的,现在是对不同解决方案进行基准测试的时候了.关键时刻:BenchmarkRunes-4 2000000 723 ns/op 96 B/op 2 allocs/opBenchmarkBytes-4 3000000 550 ns/op 32 B/op 2 allocs/opBenchmarkBytesRmndr-4 3000000 438 ns/op 32 B/op 2 allocs/opBenchmarkBytesMask-4 3000000 534 ns/op 32 B/op 2 allocs/opBenchmarkBytesMaskImpr-4 10000000 176 ns/op 32 B/op 2 allocs/opBenchmarkBytesMaskImprSrc-4 10000000 139 ns/op 32 B/op 2 allocs/opBenchmarkBytesMaskImprSrcSB-4 10000000 134 ns/op 16 B/op 1 allocs/opBenchmarkBytesMaskImprSrcUnsafe-4 10000000 115 ns/op 16 B/op 1 allocs/op只需从符文转换为字节,我们就能立即获得 24%的性能提升,并且内存需求降至三分之一.摆脱rand.Intn()并使用rand.Int63()可以提高 20%.蒙版(在出现大索引时重复)会稍微减慢(由于重复调用): -22% ...但是,当我们使用63个随机位的全部(或大部分)(一个rand.Int63()调用中有10个索引)时:加快了速度: 3倍.如果我们使用(非默认值,新的)rand.Source代替rand.Rand,我们将再次获得 21%.如果使用strings.Builder,则速度的提高仅为 3.5%,但是内存使用量也降低了 50%和分配!很好!最后,如果我们敢使用包unsafe而不是strings.Builder,我们将再次获得不错的 14%.将最终解决方案与初始解决方案进行比较:RandStringBytesMaskImprSrcUnsafe()是RandStringRunes()的 6.3倍,使用的是六分之一的内存,而分配的数量少一半.任务完成了.I want a random string of characters only (uppercase or lowercase), no numbers, in Go. What is the fastest and simplest way to do this? 解决方案 Paul's solution provides a simple, general solution.The question asks for the "the fastest and simplest way". Let's address the fastest part too. We'll arrive at our final, fastest code in an iterative manner. Benchmarking each iteration can be found at the end of the answer.All the solutions and the benchmarking code can be found on the Go Playground. The code on the Playground is a test file, not an executable. You have to save it into a file named XX_test.go and run it withgo test -bench . -benchmemForeword:Having said that, even if you don't need the fastest solution, reading through this answer might be adventurous and educational.I. Improvements1. Genesis (Runes)As a reminder, the original, general solution we're improving is this:func init() { rand.Seed(time.Now().UnixNano())}var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")func RandStringRunes(n int) string { b := make([]rune, n) for i := range b { b[i] = letterRunes[rand.Intn(len(letterRunes))] } return string(b)}2. BytesIf the characters to choose from and assemble the random string contains only the uppercase and lowercase letters of the English alphabet, we can work with bytes only because the English alphabet letters map to bytes 1-to-1 in the UTF-8 encoding (which is how Go stores strings).So instead of:var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")we can use:var letters = []bytes("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")Or even better:const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"Now this is already a big improvement: we could achieve it to be a const (there are string constants but there are no slice constants). As an extra gain, the expression len(letters) will also be a const! (The expression len(s) is constant if s is a string constant.)And at what cost? Nothing at all. strings can be indexed which indexes its bytes, perfect, exactly what we want.Our next destination looks like this:const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"func RandStringBytes(n int) string { b := make([]byte, n) for i := range b { b[i] = letterBytes[rand.Intn(len(letterBytes))] } return string(b)}3. RemainderPrevious solutions get a random number to designate a random letter by calling rand.Intn() which delegates to Rand.Intn() which delegates to Rand.Int31n().This is much slower compared to rand.Int63() which produces a random number with 63 random bits.So we could simply call rand.Int63() and use the remainder after dividing by len(letterBytes):func RandStringBytesRmndr(n int) string { b := make([]byte, n) for i := range b { b[i] = letterBytes[rand.Int63() % int64(len(letterBytes))] } return string(b)}This works and is significantly faster, the disadvantage is that the probability of all the letters will not be exactly the same (assuming rand.Int63() produces all 63-bit numbers with equal probability). Although the distortion is extremely small as the number of letters 52 is much-much smaller than 1<<63 - 1, so in practice this is perfectly fine.4. MaskingBuilding on the previous solution, we can maintain the equal distribution of letters by using only as many of the lowest bits of the random number as many is required to represent the number of letters. So for example if we have 52 letters, it requires 6 bits to represent it: 52 = 110100b. So we will only use the lowest 6 bits of the number returned by rand.Int63(). And to maintain equal distribution of letters, we only "accept" the number if it falls in the range 0..len(letterBytes)-1. If the lowest bits are greater, we discard it and query a new random number.Note that the chance of the lowest bits to be greater than or equal to len(letterBytes) is less than 0.5 in general (0.25 on average), which means that even if this would be the case, repeating this "rare" case decreases the chance of not finding a good number. After n repetition, the chance that we sill don't have a good index is much less than pow(0.5, n), and this is just an upper estimation. In case of 52 letters the chance that the 6 lowest bits are not good is only (64-52)/64 = 0.19; which means for example that chances to not have a good number after 10 repetition is 1e-8.So here is the solution:const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"const ( letterIdxBits = 6 // 6 bits to represent a letter index letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits)func RandStringBytesMask(n int) string { b := make([]byte, n) for i := 0; i < n; { if idx := int(rand.Int63() & letterIdxMask); idx < len(letterBytes) { b[i] = letterBytes[idx] i++ } } return string(b)}5. Masking ImprovedThe previous solution only uses the lowest 6 bits of the 63 random bits returned by rand.Int63(). This is a waste as getting the random bits is the slowest part of our algorithm.If we have 52 letters, that means 6 bits code a letter index. So 63 random bits can designate 63/6 = 10 different letter indices. Let's use all those 10:const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"const ( letterIdxBits = 6 // 6 bits to represent a letter index letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits letterIdxMax = 63 / letterIdxBits // # of letter indices fitting in 63 bits)func RandStringBytesMaskImpr(n int) string { b := make([]byte, n) // A rand.Int63() generates 63 random bits, enough for letterIdxMax letters! for i, cache, remain := n-1, rand.Int63(), letterIdxMax; i >= 0; { if remain == 0 { cache, remain = rand.Int63(), letterIdxMax } if idx := int(cache & letterIdxMask); idx < len(letterBytes) { b[i] = letterBytes[idx] i-- } cache >>= letterIdxBits remain-- } return string(b)}6. SourceThe Masking Improved is pretty good, not much we can improve on it. We could, but not worth the complexity.Now let's find something else to improve. The source of random numbers.There is a crypto/rand package which provides a Read(b []byte) function, so we could use that to get as many bytes with a single call as many we need. This wouldn't help in terms of performance as crypto/rand implements a cryptographically secure pseudorandom number generator so it's much slower.So let's stick to the math/rand package. The rand.Rand uses a rand.Source as the source of random bits. rand.Source is an interface which specifies a Int63() int64 method: exactly and the only thing we needed and used in our latest solution.So we don't really need a rand.Rand (either explicit or the global, shared one of the rand package), a rand.Source is perfectly enough for us:var src = rand.NewSource(time.Now().UnixNano())func RandStringBytesMaskImprSrc(n int) string { b := make([]byte, n) // A src.Int63() generates 63 random bits, enough for letterIdxMax characters! for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; { if remain == 0 { cache, remain = src.Int63(), letterIdxMax } if idx := int(cache & letterIdxMask); idx < len(letterBytes) { b[i] = letterBytes[idx] i-- } cache >>= letterIdxBits remain-- } return string(b)}Also note that this last solution doesn't require you to initialize (seed) the global Rand of the math/rand package as that is not used (and our rand.Source is properly initialized / seeded).One more thing to note here: package doc of math/rand states:So the default source is slower than a Source that may be obtained by rand.NewSource(), because the default source has to provide safety under concurrent access / use, while rand.NewSource() does not offer this (and thus the Source returned by it is more likely to be faster).7. Utilizing strings.BuilderAll previous solutions return a string whose content is first built in a slice ([]rune in Genesis, and []byte in subsequent solutions), and then converted to string. This final conversion has to make a copy of the slice's content, because string values are immutable, and if the conversion would not make a copy, it could not be guaranteed that the string's content is not modified via its original slice. For details, see How to convert utf8 string to []byte? and golang: []byte(string) vs []byte(*string).Go 1.10 introduced strings.Builder. strings.Builder a new type we can use to build contents of a string similar to bytes.Buffer. It does it internally using a []byte, and when we're done, we can obtain the final string value using its Builder.String() method. But what's cool in it is that it does this without performing the copy we just talked about above. It dares to do so because the byte slice used to build the string's content is not exposed, so it is guaranteed that no one can modify it unintentionally or maliciously to alter the produced "immutable" string.So our next idea is to not build the random string in a slice, but with the help of a strings.Builder, so once we're done, we can obtain and return the result without having to make a copy of it. This may help in terms of speed, and it will definitely help in terms of memory usage and allocations.func RandStringBytesMaskImprSrcSB(n int) string { sb := strings.Builder{} sb.Grow(n) // A src.Int63() generates 63 random bits, enough for letterIdxMax characters! for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; { if remain == 0 { cache, remain = src.Int63(), letterIdxMax } if idx := int(cache & letterIdxMask); idx < len(letterBytes) { sb.WriteByte(letterBytes[idx]) i-- } cache >>= letterIdxBits remain-- } return sb.String()}Do note that after creating a new strings.Buidler, we called its Builder.Grow() method, making sure it allocates a big-enough internal slice (to avoid reallocations as we add the random letters).8. "Mimicing" strings.Builder with package unsafestrings.Builder builds the string in an internal []byte, the same as we did ourselves. So basically doing it via a strings.Builder has some overhead, the only thing we switched to strings.Builder for is to avoid the final copying of the slice.strings.Builder avoids the final copy by using package unsafe:// String returns the accumulated string.func (b *Builder) String() string { return *(*string)(unsafe.Pointer(&b.buf))}The thing is, we can also do this ourselves, too. So the idea here is to switch back to building the random string in a []byte, but when we're done, don't convert it to string to return, but do an unsafe conversion: obtain a string which points to our byte slice as the string data.This is how it can be done:func RandStringBytesMaskImprSrcUnsafe(n int) string { b := make([]byte, n) // A src.Int63() generates 63 random bits, enough for letterIdxMax characters! for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; { if remain == 0 { cache, remain = src.Int63(), letterIdxMax } if idx := int(cache & letterIdxMask); idx < len(letterBytes) { b[i] = letterBytes[idx] i-- } cache >>= letterIdxBits remain-- } return *(*string)(unsafe.Pointer(&b))}(9. Using rand.Read())Go 1.7 added a rand.Read() function and a Rand.Read() method. We should be tempted to use these to read as many bytes as we need in one step, in order to achieve better performance.There is one small "problem" with this: how many bytes do we need? We could say: as many as the number of output letters. We would think this is an upper estimation, as a letter index uses less than 8 bits (1 byte). But at this point we are already doing worse (as getting the random bits is the "hard part"), and we're getting more than needed.Also note that to maintain equal distribution of all letter indices, there might be some "garbage" random data that we won't be able to use, so we would end up skipping some data, and thus end up short when we go through all the byte slice. We would need to further get more random bytes, "recursively". And now we're even losing the "single call to rand package" advantage...We could "somewhat" optimize the usage of the random data we acquire from math.Rand(). We may estimate how many bytes (bits) we'll need. 1 letter requires letterIdxBits bits, and we need n letters, so we need n * letterIdxBits / 8.0 bytes rounding up. We can calculate the probability of a random index not being usable (see above), so we could request more that will "more likely" be enough (if it turns out it's not, we repeat the process). We can process the byte slice as a "bit stream" for example, for which we have a nice 3rd party lib: github.com/icza/bitio (disclosure: I'm the author).But Benchmark code still shows we're not winning. Why is it so?The answer to the last question is because rand.Read() uses a loop and keeps calling Source.Int63() until it fills the passed slice. Exactly what the RandStringBytesMaskImprSrc() solution does, without the intermediate buffer, and without the added complexity. That's why RandStringBytesMaskImprSrc() remains on the throne. Yes, RandStringBytesMaskImprSrc() uses an unsynchronized rand.Source unlike rand.Read(). But the reasoning still applies; and which is proven if we use Rand.Read() instead of rand.Read() (the former is also unsynchronzed).II. BenchmarkAll right, it's time for benchmarking the different solutions.Moment of truth:BenchmarkRunes-4 2000000 723 ns/op 96 B/op 2 allocs/opBenchmarkBytes-4 3000000 550 ns/op 32 B/op 2 allocs/opBenchmarkBytesRmndr-4 3000000 438 ns/op 32 B/op 2 allocs/opBenchmarkBytesMask-4 3000000 534 ns/op 32 B/op 2 allocs/opBenchmarkBytesMaskImpr-4 10000000 176 ns/op 32 B/op 2 allocs/opBenchmarkBytesMaskImprSrc-4 10000000 139 ns/op 32 B/op 2 allocs/opBenchmarkBytesMaskImprSrcSB-4 10000000 134 ns/op 16 B/op 1 allocs/opBenchmarkBytesMaskImprSrcUnsafe-4 10000000 115 ns/op 16 B/op 1 allocs/opJust by switching from runes to bytes, we immediately have 24% performance gain, and memory requirement drops to one third.Getting rid of rand.Intn() and using rand.Int63() instead gives another 20% boost.Masking (and repeating in case of big indices) slows down a little (due to repetition calls): -22%...But when we make use of all (or most) of the 63 random bits (10 indices from one rand.Int63() call): that speeds up big time: 3 times.If we settle with a (non-default, new) rand.Source instead of rand.Rand, we again gain 21%.If we utilize strings.Builder, we gain a tiny 3.5% in speed, but we also achieved 50% reduction in memory usage and allocations! That's nice!Finally if we dare to use package unsafe instead of strings.Builder, we again gain a nice 14%.Comparing the final to the initial solution: RandStringBytesMaskImprSrcUnsafe() is 6.3 times faster than RandStringRunes(), uses one sixth memory and half as few allocations. Mission accomplished. 这篇关于如何在Go中生成固定长度的随机字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-23 14:48