在计算字母时，Haskell可以击败C吗？

本文介绍了在计算字母时，Haskell可以击败C吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述剧透：是的。见下面。试图优化一个字母计数器来匹配C.我已经把它打到了2倍的赤字。 letterCount :: B.ByteString - > V.Vector Int letterCount bs = V.accumulate （\ a _ - > a + 1）（V.replicate 256 0） letters1 where len = B.length bs letters1 = V.generate len（\i - >（fromIntegral $！B.index bs i，（）））一些注意事项：在我将 Data.Vector 更改为 Data.Vector.Unboxed 之前，它非常慢。为什么会这样？我认为大部分时间都会花在 accumulate 上。我错了。 70％的时间用于 generate 。 Haskell代码遭受不必要地将Word8转换为Int;此外，（）的无用部队可能实际上或可能不会实际创建。导入限定的Data.ByteString为B 导入限定的Data.Vector.Unboxed为V import System.Environment import Text.Printf letterCount :: B.ByteString - > V.Vector Int letterCount bs = V.accumulate （\ a _ - > a + 1）（V.replicate 256 0） letters1 where len = B.length bs letters1 = V.generate len（\i - >（fromIntegral $！B.index bs i，（））） printCounts :: V.Vector Int - > IO（） printCounts cs = mapM_ （uncurry $ printf％c：％d\\\）（zip（映射到Enum [0..255]：：String）（V.toList cs）） main :: IO（） main = do filename< - fmap head getArgs f< - B .readFile文件名 let counts = letterCount f printCounts counts 竞争C代码： #include< assert.h> #include< stdio.h> #include< string.h> #include< sys / stat.h> #include< stdlib.h> int letcnt [256]; int * letter_count（unsigned char * s，unsigned int len） { int i; memset（letcnt，0，256 * sizeof（int））; for（i = 0; i< len; i ++）{ letcnt [*（s + i）] ++; } return（letcnt）; } void print_counts（）{ int i; for（i = 0; i printf（'％c'：％d \ n，（unsigned char）i，letcnt [i]）; $ b $ // st_size int main（int argc，char ** argv） { assert（argc == 2）; FILE * f = fopen（argv [1]，r）; struct stat st; stat（argv [1]，& st）; off_t len = st.st_size; unsigned char * contents = calloc（len，1）; fread（内容，len，1，f）; fclose（f）; letter_count（contents，len）; print_counts（）; 返回0; } 定时; $ time ./a.out / usr / share / dict / words> / dev / null real 0m0.012s 用户0m0.005s sys 0m0.005s $ time ./lettercount / usr / share /字典/单词> / dev / null real 0m0.017s user 0m0.009s sys 0m0.007s 更新我认为性能上限取决于此错误： runST不是免费的。并不是说我相信不可能进一步优化，但只要runST带来一些开销，就不太可能接近C. 另外，根据@ Zeta的评论修正了C代码。 / p> 解决方案是的。如果使用 -fllvm 编译，则Haskell将匹配 User 时间中的C。令人惊讶的是，如果你切换到Lazy Bytestrings，那么Haskell版本将在 Real 时间的C版本上以小幅度但显着的优势胜过。将限定的Data.ByteString.Lazy.Char8导入为B 将限定的Data.Vector.Unboxed导入为V import System.Environment import Text.Printf letterCount :: B.ByteString - > V.Vector Int letterCount bs = V.unsafeAccumulate （\ a_ - > a + 1）（V.replicate 256 0）（解析bs）解析:: B.ByteString - > V.Vector（Int，（）） parse = V.unfoldr step where step s = if B.null s then Nothing else Just（（ fromIntegral。fromEnum $ B.head s，（）），B.tail s） { - ＃INLINE parse＃ - } printCounts :: V.Vector Int - > IO（） printCounts cs = mapM_ （uncurry $ printf％c：％d\\\）（zip（映射到Enum [0..255]：：String）（V.toList cs）） main :: IO（） main = do filename< - fmap head getArgs f< - B .readFile文件名 let counts = letterCount f printCounts counts 请记住编译例如： ghc -O2 -fllvm letterCount.hs C.我喜欢它！更新公平地更新了C代码以使用单个缓冲区，避免了预先大量分配（或任何分配），并且对缓存更友好。现在，Haskell和C代码没有显着差异，对于大型的150M输入文件，运行时间大约为190毫秒： #include< assert.h> #include< stdio.h> #include< string.h> #include< sys / stat.h> #include< stdlib.h> #define CHUNK 16384 int letcnt [256]; int * letter_count（unsigned char * s，unsigned int len） { int i; for（i = 0; i< len; i ++）{ letcnt [*（s + i）] ++; } return（letcnt）; } int * letter_count_chunks（unsigned int len，FILE * f） { int i; unsigned char chunk [CHUNK]; memset（letcnt，0，sizeof（letcnt））; for（i = 0; i< len - CHUNK; i + = CHUNK）{ fread（chunk，CHUNK，1，f）; letter_count（chunk，CHUNK）; } fread（chunk，len - i，1，f）; letter_count（chunk，len - i）; 返回letcnt; } void print_counts（）{ int i; for（i = 0; i printf（'％c'：％d \ n，（unsigned char）i，letcnt [i]）; $ b $ // st_size int main（int argc，char ** argv） { assert（argc == 2）; FILE * f = fopen（argv [1]，r）; struct stat st; stat（argv [1]，& st）; off_t len = st.st_size; letter_count_chunks（len，f）; fclose（f）; print_counts（）; 返回0; } Spoiler: Yes. See below.Trying to optimize a letter counter to match C. I've fought it to a 2x deficit.letterCount :: B.ByteString -> V.Vector IntletterCount bs = V.accumulate (\a _ -> a + 1) (V.replicate 256 0) letters1 where len = B.length bs letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ()))Some notes:It was really slow until I changed Data.Vector to Data.Vector.Unboxed. Why is that?I thought most of the time would be spent in accumulate. I was wrong.70% of the time is spent in generate.Haskell code suffers from having to pointlessly convert Word8 to Int; Also, a useless army of () may or may not actually be created.Full listing:import qualified Data.ByteString as Bimport qualified Data.Vector.Unboxed as Vimport System.Environmentimport Text.PrintfletterCount :: B.ByteString -> V.Vector IntletterCount bs = V.accumulate (\a _ -> a + 1) (V.replicate 256 0) letters1 where len = B.length bs letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ()))printCounts :: V.Vector Int -> IO ()printCounts cs = mapM_ (uncurry $ printf "%c: %d\n") (zip (map toEnum [0..255] :: String) (V.toList cs))main :: IO ()main = do filename <- fmap head getArgs f <- B.readFile filename let counts = letterCount f printCounts countsCompeting C code: #include <assert.h> #include <stdio.h> #include <string.h> #include <sys/stat.h> #include <stdlib.h> int letcnt [256]; int* letter_count(unsigned char *s, unsigned int len) { int i; memset(letcnt, 0, 256 * sizeof(int)); for(i = 0; i < len; i++){ letcnt[*(s + i)]++; } return (letcnt); } void print_counts() { int i; for(i = 0; i < 256; i++) { printf("'%c': %d\n", (unsigned char) i, letcnt[i]); } } // st_size int main(int argc, char **argv) { assert(argc == 2); FILE* f = fopen(argv[1], "r"); struct stat st; stat(argv[1], &st); off_t len = st.st_size; unsigned char* contents = calloc(len, 1); fread(contents, len, 1, f); fclose(f); letter_count(contents, len); print_counts(); return 0; }Timings;$ time ./a.out /usr/share/dict/words > /dev/nullreal 0m0.012suser 0m0.005ssys 0m0.005s$ time ./lettercount /usr/share/dict/words > /dev/nullreal 0m0.017suser 0m0.009ssys 0m0.007sUpdateI think the performance ceiling is down to this bug: runST isn't free. Not that I believe it's impossible to optimize further but unlikely to approach C so long as runST imposes some overhead.Also, fixed C-code based on @Zeta's comment. 解决方案 Yes. If you compile with -fllvm then Haskell will match C in User time. The big surprise is if you switch to Lazy Bytestrings, the Haskell version will beat the C version on Real time by a small but significant margin.import qualified Data.ByteString.Lazy.Char8 as Bimport qualified Data.Vector.Unboxed as Vimport System.Environmentimport Text.PrintfletterCount :: B.ByteString -> V.Vector IntletterCount bs = V.unsafeAccumulate (\a _ -> a + 1) (V.replicate 256 0) (parse bs)parse :: B.ByteString -> V.Vector (Int, ())parse = V.unfoldr step where step s = if B.null s then Nothing else Just ((fromIntegral . fromEnum $ B.head s, ()), B.tail s){-# INLINE parse #-}printCounts :: V.Vector Int -> IO ()printCounts cs = mapM_ (uncurry $ printf "%c: %d\n") (zip (map toEnum [0..255] :: String) (V.toList cs))main :: IO ()main = do filename <- fmap head getArgs f <- B.readFile filename let counts = letterCount f printCounts countsRemember to compile like:ghc -O2 -fllvm letterCount.hsSo, Vector + ByteString.Lazy + LLVM > C. I love it!UpdateIn fairness to C I updated the C code to use a single buffer which avoids doing a big allocation up front (or any allocations at all) and will be more cache friendly. Now the Haskell and C codes show no significant difference, both about 190ms in run-time on a best-of-3 basis against a large, 150M input file:#include <assert.h>#include <stdio.h>#include <string.h>#include <sys/stat.h>#include <stdlib.h>#define CHUNK 16384int letcnt [256];int* letter_count(unsigned char *s, unsigned int len){ int i; for(i = 0; i < len; i++){ letcnt[*(s + i)]++; } return (letcnt);}int* letter_count_chunks(unsigned int len, FILE* f){ int i; unsigned char chunk [CHUNK]; memset(letcnt, 0, sizeof(letcnt)); for(i = 0; i < len - CHUNK; i+= CHUNK) { fread(chunk, CHUNK, 1, f); letter_count(chunk, CHUNK); } fread(chunk, len - i, 1, f); letter_count(chunk, len - i); return letcnt;}void print_counts() { int i; for(i = 0; i < 256; i++) { printf("'%c': %d\n", (unsigned char) i, letcnt[i]); }}// st_sizeint main(int argc, char **argv){ assert(argc == 2); FILE* f = fopen(argv[1], "r"); struct stat st; stat(argv[1], &st); off_t len = st.st_size; letter_count_chunks(len, f); fclose(f); print_counts(); return 0;} 这篇关于在计算字母时，Haskell可以击败C吗？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！