为什么添加并发会降低这个golang代码？

本文介绍了为什么添加并发会降低这个golang代码？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述我有一些Go代码，我一直在修补，回答我的一个好奇心相关的视频游戏我的兄弟玩。基本上，下面的代码模拟了游戏中与怪物的交互，以及他可以期望他们在失败时丢弃物品的频率。我所遇到的问题是，我希望像这样的代码片段是完美的并行化，但当我并发添加时间，它需要做所有的模拟趋向于减慢原来的4-6倍无并发。为了更好地了解代码的工作原理，我有三个主要功能：交互功能，这是玩家和怪物之间的简单交互。如果怪物丢弃了一个项目，它返回1，否则返回0。模拟函数运行若干交互并返回交互结果的片段（即，1和0表示成功/不成功的交互）。最后，有一个测试函数，它运行一组模拟，并返回一个模拟结果片段，这是导致丢弃项目的交互的总数。这是我试图并行运行的最后一个功能。现在，我可以理解为什么代码会减慢，如果我为我想运行的每个测试创建一个goroutine。假设我运行100个测试，在4个CPU之间的每个goroutine的上下文切换我的MacBook Air已经杀了性能，但我只创建了多个goroutine，因为我有处理器，并将测试次数goroutines。我希望这实际上加快了代码的性能，因为我运行我的每个测试并行，但当然，我得到一个大的减速，而不是。我想知道为什么会发生这种情况，所以任何帮助都将非常感激。下面是没有例程的常规代码： package main import（ fmtmath / randtime） const（ NUMBER_OF_SIMULATIONS = 1000 NUMBER_OF_INTERACTIONS = 1000000 DROP_RATE = 0.0003 ） / ** *模拟与怪物的单次互动 * *返回1如果怪物掉落item和0否则 * / func interaction（）int {如果rand.Float64（） return 1 } 返回0 } / ** *运行几个交互并返回代表结果的切片 * / func simulation（n int）[ ] int { interactions：= make（[] int，n） for i：= range interactions { interactions [i] = interaction（）} 返回交互} / ** *运行几个模拟并返回结果 * / func test（n int）[] int { simulations：= make（[] int，n） for i：=范围模拟{ successes：= 0 for _，v：=范围模拟（NUMBER_OF_INTERACTIONS）{ successes + = v } simulations [i] = successes } return simulations } func main ）{ rand.Seed（time.Now（）。UnixNano（）） fmt.Println（Successful interactions：，test（NUMBER_OF_SIMULATIONS））} 这里是goroutines的并发代码： package main import（fmtmath / randtimeruntime ） const（ NUMBER_OF_SIMULATIONS = 1000 NUMBER_OF_INTERACTIONS = 1000000 DROP_RATE = 0.0003 ） / ** *模拟与怪物的单个互动 * *如果怪物放下一个项目返回1，否则返回0 * / func interaction { if rand.Float64（）< = DROP_RATE { return 1 } return 0 } / ** *运行几个交互并返回代表结果的切片 * / func simulation（n int）[] int { interactions：= make（[] int，n） for i：=范围互动{ interactions [i] = interaction（）} 回报互动} / ** b $ b *运行几个模拟并返回结果 * / func test（n int，c chan [] int）{ simulations：= make（[] int，n） for i：=范围模拟{ for _，v：=范围模拟（NUMBER_OF_INTERACTIONS）{ simulations [i] + = v } } $ b bc } func main（）{ rand.Seed（time.Now（）。UnixNano（）） nCPU ：= runtime.NumCPU（） runtime.GOMAXPROCS（nCPU） fmt.Println（CPU数量：，nCPU） tests：= make [] int，nCPU） for i：= range tests {c：= make（chan [] int） go test（NUMBER_OF_SIMULATIONS / nCPU，c） tests [i ] = c } //连接测试结果 results：= make（[] int，NUMBER_OF_SIMULATIONS） for i，c：= range tests { start：=（NUMBER_OF_SIMULATIONS / nCPU）* i stop：=（NUMBER_OF_SIMULATIONS / nCPU）*（i + 1）个副本（results [start：stop]，< } fmt.Println（Successful interactions：，results）} UPDATE（01/12/13 18:05）我添加了一个新版本下面的并发代码为每个goroutine根据下面的系统建议创建一个新的Rand实例。我现在看到一个非常轻微的速度比串行版本的代码（大约减少15-20％的总时间）。我很想知道为什么我没有看到更接近75％的时间减少，因为我的工作量扩展到我的MBA的4核心。有没有任何进一步的建议可以帮助？ package main import bfmtmath / randtimeruntime） const（ NUMBER_OF_SIMULATIONS = 1000 NUMBER_OF_INTERACTIONS = 1000000 DROP_RATE = 0.0003 ） / ** *模拟与怪物的单次互动 * *如果怪物丢弃了一个项目返回1，否则返回0 * / func interaction（generator * rand.Rand）int { if generator.Float64（） return 1 } return 0 } / ** *运行多个交互并返回代表结果的切片 * / func simulation（n int，generator * rand.Rand）[] int { interactions：= make（[] int，n） for i：= range interactions { interaction [i] = interaction（generator）} 返回交互} / ** *运行多个模拟并返回结果 * / func test（n int，c chan [] int）{ source：= rand.NewSource（time.Now（）。UnixNano（）） generator： = rand.New（source） simulations：= make（[] int，n） for i：=范围模拟{ for _，v：=范围模拟（NUMBER_OF_INTERACTIONS，generator） { simulations [i] + = v } } c } func main $ b rand.Seed（time.Now（）。UnixNano（）） nCPU：= runtime.NumCPU（） runtime.GOMAXPROCS（nCPU） fmt.Println Number of CPUs：，nCPU） tests：= make（[] chan [] int，nCPU） for i：= range tests {c：= make chan [] int） go test（NUMBER_OF_SIMULATIONS / nCPU，c） tests [i] = c } //连接测试结果 results：= make（[] int，NUMBER_OF_SIMULATIONS） for i，c：= range tests { start：=（NUMBER_OF_SIMULATIONS / nCPU）* i stop：=（NUMBER_OF_SIMULATIONS / nCPU ）*（i + 1） copy（results [start：stop]，< -c）} fmt.Println（Successful interactions： } UPDATE（01/13/13 17:58） strong> 感谢大家帮我找出我的问题。我终于得到了我正在寻找的答案，所以我想我只是总结在这里对于有同样问题的任何人。本质上我有两个主要问题：第一，虽然我的代码是尴尬并行，当我在可用的处理器之间分割它运行速度较慢，其次，解决方案打开了另一个问题，这是我的序列代码运行速度是单处理器上运行的并发代码的两倍，你会期望大致相同。在这两种情况下，问题是随机数生成函数 rand.Float64 。基本上，这是由 rand 包提供的一个方便的函数。在该包中，每个方便函数创建并使用 Rand 结构的全局实例。此全局 Rand 实例具有与其相关联的互斥锁。因为我使用这个方便的函数，我并不真正能够并行化我的代码，因为每个goroutine必须排队访问全局 Rand 实例。解决方案（下面的系统建议）是为每个goroutine创建一个单独的 Rand 结构实例。第二个问题是我的非并行并发代码（即我的并发代码只运行一个处理器）运行速度是序列代码的两倍。这样做的原因是，即使我只运行一个处理器和单个goroutine，goroutine有自己的实例的 Rand 结构，我创建，和我创建它没有互斥锁。顺序代码仍然使用 rand.Float64 便利函数，它使用全局互斥锁保护的 Rand 实例。获取该锁的成本导致顺序代码运行两次慢。所以，故事的道德是，每当性能重要时，确保你创建了一个 Rand 解决方案问题似乎在于，来自于您使用 rand.Float64（），它使用一个具有Mutex锁的共享全局对象。相反，如果为每个CPU创建一个单独的 rand.New（），将它传递到 interactions（），并使用它创建 Float64（），有了巨大的改进。更新以显示现在使用 rand.New（） p> test（）函数已修改为使用给定通道，或返回结果。 func test（n int，c chan [] int）[] int { source：= rand.NewSource（time.Now ）.UnixNano（）） generator：= rand.New（source） simulations：= make（[] int，n） for i：=范围模拟{ _，v：=范围模拟（NUMBER_OF_INTERACTIONS，generator）{ simulations [i] + = v } } 如果c == nil { } c< - simulations return nil } main（）函数已更新以运行这两个测试，并输出计时结果。 func main（）{ rand.Seed（time.Now（）。UnixNano（）） nCPU：= runtime.NumCPU b runtime.GOMAXPROCS（nCPU） fmt.Println（CPU数量：，nCPU） start：= time.Now（） fmt.Println交互：，len（test（NUMBER_OF_SIMULATIONS，nil））） fmt.Println（time.Since（start）） start = time.Now（） tests： = make（[] chan [] int，nCPU） for i：= range tests {c：= make（chan [] int） go test（NUMBER_OF_SIMULATIONS / nCPU，c） tests [i] = c } //连接测试结果 results：= make（[] int，NUMBER_OF_SIMULATIONS） for i， c：=范围测试{ start：=（NUMBER_OF_SIMULATIONS / nCPU）* i stop：=（NUMBER_OF_SIMULATIONS / nCPU）*（i + 1），< -c）} fmt.Println（Successful interactions：，len（results）） fmt.Println（time.Since（start））} 我收到的输出： > CPU数：2 > >成功互动：1000 > 1m20.39959s > >成功互动：1000 > 41.392299s I've got a bit of Go code that I've been tinkering with to answer a little curiosity of mine related to a video game my brother-in-law plays. Essentially, the code below simulates interactions with monsters in the game and how often he can expect them to drop items upon their defeat. The problem I'm having is that I would expect a piece of code like this to be perfect for parallelization, but when I add in concurrency the time it takes to do all of the simulations tends to slow down by 4-6 times the original without concurrency. To give you a better understanding of how the code works, I have three main functions: The interaction function which is a simple interaction between the player and a monster. It returns 1 if the monster drops an item, and 0 otherwise. The simulation function runs several interactions and returns a slice of interaction results (i.e., 1's and 0's representing successful/unsuccessful interactions). Finally, there is the test function which runs a set of simulations and returns a slice of simulation results which are the total number of interactions that resulted in a dropped item. It's the last function which I am trying to run in parallel. Now, I could understand why the code would slow down if I created a goroutine for each test that I want to run. Assuming I'm running 100 tests, the context switching between each of the goroutines across the 4 CPUs my MacBook Air has would kill the performance, but I'm only creating as many goroutines as I have processors and dividing the number of tests between the goroutines. I would expect this to actually speed up the code's performance since I am running each of my tests in parallel, but, of course, I'm getting a major slow down instead. I'd love to figure out why this is happening, so any help would be greatly appreciated.Below is the regular code without the go routines:package mainimport ( "fmt" "math/rand" "time")const ( NUMBER_OF_SIMULATIONS = 1000 NUMBER_OF_INTERACTIONS = 1000000 DROP_RATE = 0.0003)/** * Simulates a single interaction with a monster * * Returns 1 if the monster dropped an item and 0 otherwise */func interaction() int { if rand.Float64() <= DROP_RATE { return 1 } return 0}/** * Runs several interactions and retuns a slice representing the results */func simulation(n int) []int { interactions := make([]int, n) for i := range interactions { interactions[i] = interaction() } return interactions}/** * Runs several simulations and returns the results */func test(n int) []int { simulations := make([]int, n) for i := range simulations { successes := 0 for _, v := range simulation(NUMBER_OF_INTERACTIONS) { successes += v } simulations[i] = successes } return simulations}func main() { rand.Seed(time.Now().UnixNano()) fmt.Println("Successful interactions: ", test(NUMBER_OF_SIMULATIONS))}And, here is the concurrent code with the goroutines:package mainimport ( "fmt" "math/rand" "time" "runtime")const ( NUMBER_OF_SIMULATIONS = 1000 NUMBER_OF_INTERACTIONS = 1000000 DROP_RATE = 0.0003)/** * Simulates a single interaction with a monster * * Returns 1 if the monster dropped an item and 0 otherwise */func interaction() int { if rand.Float64() <= DROP_RATE { return 1 } return 0}/** * Runs several interactions and retuns a slice representing the results */func simulation(n int) []int { interactions := make([]int, n) for i := range interactions { interactions[i] = interaction() } return interactions}/** * Runs several simulations and returns the results */func test(n int, c chan []int) { simulations := make([]int, n) for i := range simulations { for _, v := range simulation(NUMBER_OF_INTERACTIONS) { simulations[i] += v } } c <- simulations}func main() { rand.Seed(time.Now().UnixNano()) nCPU := runtime.NumCPU() runtime.GOMAXPROCS(nCPU) fmt.Println("Number of CPUs: ", nCPU) tests := make([]chan []int, nCPU) for i := range tests { c := make(chan []int) go test(NUMBER_OF_SIMULATIONS/nCPU, c) tests[i] = c } // Concatentate the test results results := make([]int, NUMBER_OF_SIMULATIONS) for i, c := range tests { start := (NUMBER_OF_SIMULATIONS/nCPU) * i stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1) copy(results[start:stop], <-c) } fmt.Println("Successful interactions: ", results)}UPDATE (01/12/13 18:05) I've added a new version of the concurrent code below that creates a new Rand instance for each goroutine per "the system"'s suggestion below. I'm now seeing a very slight speed up compared to the serial version of the code (around a 15-20% reduction in overall time taken). I'd love to know why I don't see something closer to a 75% reduction in time since I'm spreading the workload over my MBA's 4 cores. Does anyone have any further suggestions that could help out?package mainimport ( "fmt" "math/rand" "time" "runtime")const ( NUMBER_OF_SIMULATIONS = 1000 NUMBER_OF_INTERACTIONS = 1000000 DROP_RATE = 0.0003)/** * Simulates a single interaction with a monster * * Returns 1 if the monster dropped an item and 0 otherwise */func interaction(generator *rand.Rand) int { if generator.Float64() <= DROP_RATE { return 1 } return 0}/** * Runs several interactions and retuns a slice representing the results */func simulation(n int, generator *rand.Rand) []int { interactions := make([]int, n) for i := range interactions { interactions[i] = interaction(generator) } return interactions}/** * Runs several simulations and returns the results */func test(n int, c chan []int) { source := rand.NewSource(time.Now().UnixNano()) generator := rand.New(source) simulations := make([]int, n) for i := range simulations { for _, v := range simulation(NUMBER_OF_INTERACTIONS, generator) { simulations[i] += v } } c <- simulations}func main() { rand.Seed(time.Now().UnixNano()) nCPU := runtime.NumCPU() runtime.GOMAXPROCS(nCPU) fmt.Println("Number of CPUs: ", nCPU) tests := make([]chan []int, nCPU) for i := range tests { c := make(chan []int) go test(NUMBER_OF_SIMULATIONS/nCPU, c) tests[i] = c } // Concatentate the test results results := make([]int, NUMBER_OF_SIMULATIONS) for i, c := range tests { start := (NUMBER_OF_SIMULATIONS/nCPU) * i stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1) copy(results[start:stop], <-c) } fmt.Println("Successful interactions: ", results)}UPDATE (01/13/13 17:58)Thanks everyone for the help in figuring out my problem. I did finally get the answer I was looking for and so I thought I would just summarize here for anyone who has the same problem. Essentially I had two main issues: first, even though my code was embarrassingly parallel, it was running slower when I split it up amongst the available processors, and second, the solution opened up another issue, which was my serial code was running twice as slow as the concurrent code running on single processor, which you would expect to be roughly the same . In both cases the issue was the random number generator function rand.Float64. Basically, this is a convenience function provided by the rand package. In that package, a global instance of the Rand struct is created and used by each of the convenience functions. This global Rand instance has a mutex lock associated with it. Since I was using this convenience function, I wasn't truly able to parallelize my code since each of the goroutines would have to line up for access to the global Rand instance. The solution (as "the system" suggests below) is to create a separate instance of the Rand struct for each goroutine. This solved the first problem but created the second one.The second problem was that my non-parallel concurrent code (i.e., my concurrent code running with only a single processor) was running twice as fast as the sequential code. The reason for this was that, even though I was only running with a single processor and a single goroutine, that goroutine had its own instance of the Rand struct that I had created, and I had created it without the mutex lock. The sequential code was still using the rand.Float64 convenience function which made use of the global mutex protected Rand instance. The cost of acquiring that lock was causing the sequential code to run twice as slow. So, the moral of the story is, whenever performance matters, make sure you create an instance of the Rand struct and call the function you need off of it rather than using the convenience functions provided by the package. 解决方案 The issue seems to come from your use of rand.Float64(), which uses a shared global object with a Mutex lock on it.Instead, if for each CPU you create a separate rand.New(), pass it through to the interactions(), and use it to create the Float64(), there's a massive improvement.Update to show the changes to the new example code in the question that now uses rand.New()The test() function was modified to either use a given channel, or return the result.func test(n int, c chan []int) []int { source := rand.NewSource(time.Now().UnixNano()) generator := rand.New(source) simulations := make([]int, n) for i := range simulations { for _, v := range simulation(NUMBER_OF_INTERACTIONS, generator) { simulations[i] += v } } if c == nil { return simulations } c <- simulations return nil }The main() function was updated to run both tests, and output the timed result.func main() { rand.Seed(time.Now().UnixNano()) nCPU := runtime.NumCPU() runtime.GOMAXPROCS(nCPU) fmt.Println("Number of CPUs: ", nCPU) start := time.Now() fmt.Println("Successful interactions: ", len(test(NUMBER_OF_SIMULATIONS, nil))) fmt.Println(time.Since(start)) start = time.Now() tests := make([]chan []int, nCPU) for i := range tests { c := make(chan []int) go test(NUMBER_OF_SIMULATIONS/nCPU, c) tests[i] = c } // Concatentate the test results results := make([]int, NUMBER_OF_SIMULATIONS) for i, c := range tests { start := (NUMBER_OF_SIMULATIONS/nCPU) * i stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1) copy(results[start:stop], <-c) } fmt.Println("Successful interactions: ", len(results)) fmt.Println(time.Since(start))}The output is I received:> Number of CPUs: 2 >> Successful interactions: 1000 > 1m20.39959s>> Successful interactions: 1000> 41.392299s 这篇关于为什么添加并发会降低这个golang代码？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！