- 免责声明:对上述文本中的任何陈述 ,我不承担任何责任。据我所知,我可能会被吸毒或者吸毒...... 不,我不是偏执狂。你们都认为我是偏执狂,不是吗! #EOT Hi, I am facing some performance issues in an algorithm to "translate"some YUV data in another format. I''ll make everything very simple:I have 4 blocks of Y data, one of U data and one of V data. Repeat theprevious structure a few hundreds times to get a whole image.Every block consists in a 8x8 grid.Let''s consider Y data only:I have to convert it in a way that I take chunks of 8bytes and put indifferent memory locations to build up a "planar" or "raster"representation of the data. The following code processes one block:unsigned char* destination;unsigned char* source;int xsize, ysize, x, y, i;[...]for(y=0; y<ysize; y+=16)for(x=0; x<xsize; x+=16)for(i=0; i<8; i++){memcpy(destination+(y+i)*xsize+x, yuvcurr, sizeof(unsigned char)*8);yuvcurr+=8;}I noticed that the code above is *much* slower (ARM9, gcc 4.0.0, uClibc)than the following trick:unsigned int* destination;unsigned int* source;int xsize, ysize, x, y, i;[...]for(y=0; y<ysize; y+=16)for(x=0; x<xsize; x+=16)for(i=0; i<8; i++){dest = (unsigned int*)(final+(y+i)*xsize+x);*(dest++) = *(source++);*(dest++) = *(source++);}Basically I know I have to copy 8 bytes or two words and I do thatinstead of calling memcpy.The first solution roughly gives me 70fps, while the second one 230.Any comment on this? Am I missing something, the memcpy implementationis mislead by something?byeAlessio 解决方案 Alessio Sangalli wrote:Hi, I am facing some performance issues in an algorithm to "translate"some YUV data in another format. I''ll make everything very simple:[...]It might be simpler if you showed the actual code insteadof an paraphrase teeming with undeclared variables ...--Eric Sosman es*****@ieee-dot-org.invalidEric Sosman wrote: It might be simpler if you showed the actual code insteadof an paraphrase teeming with undeclared variables ...every IRC channel/newsgroup/forum has its own rules. I am now using apastebin service. What is commented is the "alternate" way to do it. The"memcpy" version is at least 3 times slower. http://pastebin.ca/1075068byeas Alessio Sangalli wrote:) Basically I know I have to copy 8 bytes or two words and I do that) instead of calling memcpy.) The first solution roughly gives me 70fps, while the second one 230.)) Any comment on this? Am I missing something, the memcpy implementation) is mislead by something?The only way the memcpy would be able to beat copying by hand of 8 bytes,is if it were inlined by the compiler and then optimized heavily, or ifthe compiler ''knows'' about memcpy and can optimize it accordingly.(And even then, it could only beat your code by a small margin, using128-bit instructions.)SaSW, Willem--Disclaimer: I am in no way responsible for any of the statementsmade in the above text. For all I know I might bedrugged or something..No I''m not paranoid. You all think I''m paranoid, don''t you !#EOT 这篇关于在内存中移动字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-20 09:15
查看更多