本文介绍了在Haskell中优化数组的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我正在研究类似MineCraft的世界的地形生成算法。目前,我正在使用单纯的噪音,基于在这篇文章中的实现单纯噪音揭秘[PDF] ,因为单纯的噪声应该比Perlin噪声更快,并且产生更少的伪像。这看起来相当不错(见图片),但到目前为止它也很慢。 运行噪声函数10次(我需要不同波长的噪声,例如地形高度,温度,树的位置等等),一个块(16×16×128块)中的每个块具有3个八度的噪声,或者总共大约一百万个对噪声功能的调用,大约需要700-800毫秒。尽管事实上在算法中没有明显的昂贵的操作(至少对我来说),但对于产生具有任何体面的速度的地形而言,这至少是一个数量级太慢的速度。只是楼,模,一些数组查找和基本的算术。下面列出了算法(用Haskell编写)。 SCC的意见是为了分析。我已经省略了二维噪声函数,因为它们的工作方式是一样的。 g3 ::(Floating a,RealFrac a) => a g3 = 1/6 { - #INLINE int# - } int ::(积分a,数字b)=> a - > b int = fromIntegral grad3 ::(Floating a,RealFrac a)=> V.Vector(a,a,a) grad3 = V.fromList $ [(1,1,0),( - 1,1,0),(1,-1,0),( - 1 ,-1,0),(1,0,1),( - 1,0,0),(1,0,-1),( - 1,0,-1),(0,1,1),(0,-1,1),(0,1,-1),(0,-1,-1)] { - #INLINE dot3 # - } dot3 :: Num a => (a,a,a) - > a - > a - > a - > a dot3(a,b,c)xyz = a * x + b * y + c * z { - #INLINE fastFloor# - } fastFloor :: RealFrac a => a - > Int fastFloor x = truncate(如果x> 0则x else x - 1) - 产生噪声函数中使用的随机置换 perm :: Int - >排列 perm seed = V.fromList。 concat。复制2。 shuffle'[0..255] 256 $ mkStdGen种子 - 产生-0.5到0.5之间的3D噪音 simplex3D ::(Floating a,RealFrac a)=>置换 - > a - > a - > a - > simplex3D pxyz = { - #SCCout# - } 16 *(n gi0(x0,y0,z0)+ n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) (x,y,z)/ 3)(x0,y0,z0) )= { - #SCCx0-z0# - }(x - int i + t,y - int j + t,z - int k + t)其中t = int(i + j + k)* g3 $如果x0> = y0 ,那么如果y0> = z0,则b(b1,b1,j1,k1,i2,j2,k2)= { - #SCCi1-k2 0,0,1,1,0)否则如果x0> = z0那么(1,0,0,1,0,1)else(0,0,1,1,0,1) else if y0< z0 then(0,0,1,0,1,1)else if x0 xyz1 = { - #SCCxyz1# - }(x0 - int i1 + g3,y0-int j1 + g3,z0-int k1 + g3) xyz2 = { - #SCCxyz2# - }(x0 - int i2 + 2 * g3,y0 - int j2 + 2 * g3,z0-int k2 + 2 * g3) xyz3 = { - #SCCxyz3# - }(x0-1 + 3 * g3,y0-1 + 3 * g3,z0-1 + 3 * g3)(ii,jj,kk)= { - #SCCiijjkk# - }(i。& 255,j。& 255,k。& 255) gi0 = { - #SCCgi0# - } mod(pV!!(ii + pV!!(jj + pV!!kk)))12 gi1 = { - #SCCgi1 ()()()()()()()()()()() - } mod(p V!!(ii + i2 + p V!!(jj + j2 + p V!!(kk + k2))))12 gi3 = { - #SCCgi3# - (p + 1 + pV!!(kk + 1))))12 { - #INLINE n# - } n gi(x',y',z')= { - #SCCn# - }(\ a - >> if a a * a * a * a * dot3 grad3 V.!gi)x'y'z')$ 0.6 - x'* x' - y'* y' - z * Z 谐波::(民一个,分数A)=> Int - > (a - > a) - > a 谐波八度噪声= f八度/(2-1 / int(2 ^(octave-1)))其中f 0 = 0 fo = let r = int $ 2 ^(o - 1)in noise r / r + f(o - 1) - 介于-0.5和0.5 之间的泛音三维噪声harmonicNoise3D ::(RealFrac a,Floating a)=> ;置换 - > Int - > a - > a - > a - > a - > a harmonicNoise3D p octaves lxyz =谐波八度音(\\ * - f - > simplex3D p(x * f / l)(y * f / l)(z * f / l)) 为了分析,我使用了下面的代码: main = do start< - getCurrentTime print $ q() end< - getCurrentTime print $ diffUTCTime end start 产生以下信息: $ p $ 成本中心模块%时间%分配 simplex3D主要18.8 21.0 $ b $ $主要18.0 19.6 主要10.1 9.2 harmonicNoise3D主要9.8 4.5 主要6.4 5.8 int Main 4.0 2.9 gi3主4.0 3.0 xyz2主3.5 5.9 gi1主要3.4 3.4 gi0主要3.4 2.7 快速主要3.2 0.6 xyz1主要2.9 5.9 ijk主要2.7 3.5 gi2主要2.7 3.3 xyz3主要2.6 4.1 iijjkk Main 1.6 2.5 dot3 Main 1.6 0.7 也将该算法移植到C#。那里的表现快了大概3到4倍,所以我想我一定是做错了。但即使如此,它还不如我想要的那么快。所以我的问题是:任何人都可以告诉我是否有任何方法来加速我的实现和/或一般算法,或者有人知道有一个不同的噪声算法,有更好的性能特点,但有相似的外观? 更新: 按照以下提供的一些建议,代码现在看起来如下: 模块Noise(Permutation,Perm ,noise3D,simplex3D )其中 导入Data.Bits 导入限定的Data.Vector.Unboxed作为UV 导入System.Random 导入System.Random.Shuffle 类型Permutation = UV .Vector Int g3 :: Double g3 = 1/6 { - #INLINE int# - } int :: Int - > ; Double int = fromIntegral grad3 :: UV.Vector(Double,Double,Double) grad3 = UV.fromList $ [(1,1,0),( - (1,0,1),(-1,1,0),( - 1,-1,0),(1,0,1),( - 1,0,1),(1, 0,-1),( - 1,0,-1),(0,1,1),(0,-1,1),(0,1,-1),(0, - 1,-1)] { - #INLINE dot3# - } dot3 ::(Double,Double,Double) - >双 - >双 - >双 - > Double dot3(a,b,c)xyz = a * x + b * y + c * z { - #INLINE fastFloor# - } fastFloor :: Double - > Int fastFloor x = truncate(如果x> 0则x else x - 1) - 产生噪声函数中使用的随机置换 perm :: Int - > Permutation perm seed = UV.fromList。 concat。复制2。 shuffle'[0..255] 256 $ mkStdGen种子 - 产生-0.5到0.5之间的3D噪声 noise3D ::排列 - >双 - >双 - >双 - > ($,$,$)$($,$,$)$($,$,$)$($,$) ,sz)其中sa = fastFloor(a +(x + y + z)/ 3)(x0,y0,z0)=(x - int i + t,y - int j + t,z - int其中t = int(i + j + k)* g3 (i1,j1,k1,i2,j2,k2)= if x0> = y0 then if y0> = z0则(1,0,0,1,1,0)else 如果x0> = z0则(1,0,0,1,0,1)else(0,0,1,1 ,0,1) else if y0< z0 then(0,0,1,0,1,1)else if x0 xyz1 =(x0 - int i1 + g3,y0 - int j1 + g3, z0-int k1 + g3) xyz2 =(x0-int i2 + 2 * g3,y0 -int j2 + 2 * g3,z0 -int k2 + 2 * g3) xyz3 =(x0- 1 + 3 * g3,y0-1 + 3 * g3,z0-1 + 3 * g3)(ii,jj,kk)=(i& 255,j。& 255,k (& 255) gi0 = rem(UV.unsafeIndex p(ii + UV.unsafeIndex p(jj + UV.unsafeIndex pkk)))12 gi1 = rem(UV.unsafeIndex p ii + i1 + UV.unsafeIndex p(jj + j1 + UV.unsafeIndex p(kk + k1))))12 gi2 = rem(UV.unsafeIndex p(ii + i2 + UV.unsafeIndex p(jj + (UV + UV.unsafeIndex p(kk + k2))))12 gi3 = rem(UV.unsafeIndex p(ii + 1 + UV.unsafeIndex p(jj + 1 + UV.unsafeIndex p(kk + 1) ))12 { - #INLINE n# - } n gi(x',y',z')=(\ a - >如果a a * a * a * a * dot3(UV.unsafeIndex grad3 gi)x'y'z')$ 0.6 - x'* x' - y'* y' - z'* z' harmo nic :: Int - > (Double - > Double) - >双谐波八度噪声= f八度/(2-1 / int(2 ^(octave-1)))其中f 0 = 0 fo = let r = 2 ^^(o - 1)in noise r / r + f(o-1) --3D simplex noise --syntax:simplex3D permutation number_of_octaves wavelength xyz simplex3D :: Permutation - > ; Int - >双 - >双 - >双 - >双 - > Double simplex3D p octaves lxyz =和音八度音(\\ * f / l)noise3D p(x * f / l)(y * f / l)(z * f / l)) 加上我的块大小减少到8x8x128,现在产生新的地形块的速度大约为10-20 fps,这意味着到处走动现在已经不像以前那样成问题了。当然,任何其他的性能改进仍然是值得欢迎的。 解决方案最初突出的是你的代码是高度多态。你应该把你的浮点类型统一到 Double ,所以GHC(和LLVM)有机会应用更积极的优化。 $ b $注意,对于那些试图复制的代码,这个代码会导入: 将限定的Data.Vector导入为V import Data.Bits import Data.Time.Clock import System.Random import System.Random.Shuffle type Permutation = V.Vector Int 好的。有很多东西可以用来改善这些代码。 改进 数据表示 专用于具体的浮点类型,而不是多态的浮点函数b $ b 将元组(a,a,a)替换为unboxed triple T!Double!Double!Double 从 Data.Array 切换为 Data.Array.Unboxed 置换 用 repa 中的多维unboxed数组替换三元组的盒装数组>包 编译器标记 编译 -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march =本地(或等同于 -fllvm ) 增加spec constr阈值 - -fspec-constr-count = 16 更高效的库函数 使用mersenne-random来代替StdGen来生成随机数。 用 mod c> rem 用$ unchecked indexing VU替换 V。!索引。 unsafeIndex (移至 Data.Vector.Unboxed 运行时设置 增加默认分配区域: -A20M code>或 -H 另外,请检查您的算法是否与C#相同,并且您使用的是相同的数据结构。 I'm working on a terrain generation algorithm for a MineCraft-like world. Currently, I'm using simplex noise based on the implementation in the paper 'Simplex Noise Demystified' [PDF], since simplex noise is supposed to be faster and to have fewer artifacts than Perlin noise. This looks fairly decent (see image), but so far it's also pretty slow.Running the noise function 10 times (I need noise with different wavelengths for things like terrain height, temperature, tree location, etc.) with 3 octaves of noise for each block in a chunk (16x16x128 blocks), or about 1 million calls to the noise function in total, takes about 700-800 ms. This is at least an order of magnitude too slow for the purposes of generating terrain with any decent kind of speed, despite the fact that there are no obvious expensive operations in the algorithm (at least to me). Just floor, modulo, some array lookups and basic arithmetic. The algorithm (written in Haskell) is listed below. The SCC comments are for profiling. I've omitted the 2D noise functions, since they work the same way.g3 :: (Floating a, RealFrac a) => ag3 = 1/6{-# INLINE int #-}int :: (Integral a, Num b) => a -> bint = fromIntegralgrad3 :: (Floating a, RealFrac a) => V.Vector (a,a,a)grad3 = V.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0), (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1), (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]{-# INLINE dot3 #-}dot3 :: Num a => (a, a, a) -> a -> a -> a -> adot3 (a,b,c) x y z = a * x + b * y + c * z{-# INLINE fastFloor #-}fastFloor :: RealFrac a => a -> IntfastFloor x = truncate (if x > 0 then x else x - 1)--Generate a random permutation for use in the noise functionsperm :: Int -> Permutationperm seed = V.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed--Generate 3D noise between -0.5 and 0.5simplex3D :: (Floating a, RealFrac a) => Permutation -> a -> a -> a -> asimplex3D p x y z = {-# SCC "out" #-} 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where (i,j,k) = {-# SCC "ijk" #-} (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3) (x0,y0,z0) = {-# SCC "x0-z0" #-} (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3 (i1,j1,k1,i2,j2,k2) = {-# SCC "i1-k2" #-} if x0 >= y0 then if y0 >= z0 then (1,0,0,1,1,0) else if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1) else if y0 < z0 then (0,0,1,0,1,1) else if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0) xyz1 = {-# SCC "xyz1" #-} (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3) xyz2 = {-# SCC "xyz2" #-} (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3) xyz3 = {-# SCC "xyz3" #-} (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3) (ii,jj,kk) = {-# SCC "iijjkk" #-} (i .&. 255, j .&. 255, k .&. 255) gi0 = {-# SCC "gi0" #-} mod (p V.! (ii + p V.! (jj + p V.! kk ))) 12 gi1 = {-# SCC "gi1" #-} mod (p V.! (ii + i1 + p V.! (jj + j1 + p V.! (kk + k1)))) 12 gi2 = {-# SCC "gi2" #-} mod (p V.! (ii + i2 + p V.! (jj + j2 + p V.! (kk + k2)))) 12 gi3 = {-# SCC "gi3" #-} mod (p V.! (ii + 1 + p V.! (jj + 1 + p V.! (kk + 1 )))) 12 {-# INLINE n #-} n gi (x',y',z') = {-# SCC "n" #-} (\a -> if a < 0 then 0 else a*a*a*a*dot3 (grad3 V.! gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'harmonic :: (Num a, Fractional a) => Int -> (a -> a) -> aharmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where f 0 = 0 f o = let r = int $ 2 ^ (o - 1) in noise r / r + f (o - 1)--Generate harmonic 3D noise between -0.5 and 0.5harmonicNoise3D :: (RealFrac a, Floating a) => Permutation -> Int -> a -> a -> a -> a -> aharmonicNoise3D p octaves l x y z = harmonic octaves (\f -> simplex3D p (x * f / l) (y * f / l) (z * f / l))For profiling, I used the following code,q _ = let p = perm 0 in sum [harmonicNoise3D p 3 l x y z :: Float | l <- [1..10], y <- [0..127], x <- [0..15], z <- [0..15]]main = do start <- getCurrentTime print $ q () end <- getCurrentTime print $ diffUTCTime end startwhich produces the following information:COST CENTRE MODULE %time %allocsimplex3D Main 18.8 21.0n Main 18.0 19.6out Main 10.1 9.2harmonicNoise3D Main 9.8 4.5harmonic Main 6.4 5.8int Main 4.0 2.9gi3 Main 4.0 3.0xyz2 Main 3.5 5.9gi1 Main 3.4 3.4gi0 Main 3.4 2.7fastFloor Main 3.2 0.6xyz1 Main 2.9 5.9ijk Main 2.7 3.5gi2 Main 2.7 3.3xyz3 Main 2.6 4.1iijjkk Main 1.6 2.5dot3 Main 1.6 0.7To compare, I also ported the algorithm to C#. Performance there was about 3 to 4 times faster, so I imagine I must be doing something wrong. But even then it's not nearly as fast as I would like. So my question is this: can anyone tell me if there are any ways to speed up my implementation and/or the algorithm in general or does anyone know of a different noise algorithm that has better performance characteristics but a similar look?Update:After following some of the suggestions offered below, the code now looks as follows:module Noise ( Permutation, perm , noise3D, simplex3D ) whereimport Data.Bitsimport qualified Data.Vector.Unboxed as UVimport System.Randomimport System.Random.Shuffletype Permutation = UV.Vector Intg3 :: Doubleg3 = 1/6{-# INLINE int #-}int :: Int -> Doubleint = fromIntegralgrad3 :: UV.Vector (Double, Double, Double)grad3 = UV.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0), (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1), (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]{-# INLINE dot3 #-}dot3 :: (Double, Double, Double) -> Double -> Double -> Double -> Doubledot3 (a,b,c) x y z = a * x + b * y + c * z{-# INLINE fastFloor #-}fastFloor :: Double -> IntfastFloor x = truncate (if x > 0 then x else x - 1)--Generate a random permutation for use in the noise functionsperm :: Int -> Permutationperm seed = UV.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed--Generate 3D noise between -0.5 and 0.5noise3D :: Permutation -> Double -> Double -> Double -> Doublenoise3D p x y z = 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where (i,j,k) = (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3) (x0,y0,z0) = (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3 (i1,j1,k1,i2,j2,k2) = if x0 >= y0 then if y0 >= z0 then (1,0,0,1,1,0) else if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1) else if y0 < z0 then (0,0,1,0,1,1) else if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0) xyz1 = (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3) xyz2 = (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3) xyz3 = (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3) (ii,jj,kk) = (i .&. 255, j .&. 255, k .&. 255) gi0 = rem (UV.unsafeIndex p (ii + UV.unsafeIndex p (jj + UV.unsafeIndex p kk ))) 12 gi1 = rem (UV.unsafeIndex p (ii + i1 + UV.unsafeIndex p (jj + j1 + UV.unsafeIndex p (kk + k1)))) 12 gi2 = rem (UV.unsafeIndex p (ii + i2 + UV.unsafeIndex p (jj + j2 + UV.unsafeIndex p (kk + k2)))) 12 gi3 = rem (UV.unsafeIndex p (ii + 1 + UV.unsafeIndex p (jj + 1 + UV.unsafeIndex p (kk + 1 )))) 12 {-# INLINE n #-} n gi (x',y',z') = (\a -> if a < 0 then 0 else a*a*a*a*dot3 (UV.unsafeIndex grad3 gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'harmonic :: Int -> (Double -> Double) -> Doubleharmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where f 0 = 0 f o = let r = 2 ^^ (o - 1) in noise r / r + f (o - 1)--3D simplex noise--syntax: simplex3D permutation number_of_octaves wavelength x y zsimplex3D :: Permutation -> Int -> Double -> Double -> Double -> Double -> Doublesimplex3D p octaves l x y z = harmonic octaves (\f -> noise3D p (x * f / l) (y * f / l) (z * f / l))Together with reducing my chunk size to 8x8x128, generating new terrain chunks now occurs at about 10-20 fps, which means moving around is now not nearly as problematic as before. Of course, any other performance improvements are still welcome. 解决方案 The thing that stands out initially is that your code is highly polymorphic. You should specialize your floating point type uniformly to Double, so GHC (and LLVM) have a chance of applying more aggressive optimizations.Note, for those trying to reproduce, this code imports:import qualified Data.Vector as Vimport Data.Bitsimport Data.Time.Clockimport System.Randomimport System.Random.Shuffletype Permutation = V.Vector IntOk. There's lots of things you can try to improve this code.ImprovementsData representationSpecialize to a concrete floating point type, instead of polymorphic floating point functionsReplace tuple (a,a,a) with unboxed triple T !Double !Double !DoubleSwitch from Data.Array to Data.Array.Unboxed for PermutationsReplace use of boxed array of triples with multidimensional unboxed array from repa packageCompiler flagsCompile with -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native (or equivalent with -fllvm)Increase spec constr threshold -- -fspec-constr-count=16More efficient library functionsUse mersenne-random instead of StdGen to generate randomsReplace mod with remReplace V.! indexing with unchecked indexing VU.unsafeIndex (after moving to Data.Vector.UnboxedRuntime settingsIncrease the default allocation area: -A20M or -HAlso, check your algorithm is identical to the C# one, and you're using the same data structures. 这篇关于在Haskell中优化数组的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
07-26 04:21