问题描述
我正在学习有关内联汇编。我想在X code 4 LLVM编译器3.0编写的iPhone一个简单的程序。我成功编写基本的内联汇编codeS。
例如:
INT子(INT A,INT B)
{
INT℃;
ASM(亚%0%1%2:= R(三):R(一),R(B));
返回℃;
}
我发现它在stackoverflow.com和它工作得很好。但是,我不知道该怎么写code约循环。
我需要装配codeS像
无效提亮(无符号字符* SRC,无符号字符* DST,诠释numPixels,INT强度)
{
的for(int i = 0; I< numPixels;我++)
{
DST由[i] = SRC [I] +强度;
}
}
在循环部分看看这里 - 的
基本上,你会想是这样的:
无效提亮(无符号字符* SRC,无符号字符* DST,诠释numPixels,诠释强度){
ASM挥发性(
\\ t MOV R3,#0 \\ n
LLOOP:\\ N
\\ t CMP R3,%2 \\ n
\\ t BGE借给\\ n
\\ t LDRB R4,[%0,R3] \\ n
\\ t增加R4,R4,%3 \\ n
\\ t STRB R4,[%1,R3] \\ n
\\ t增加R3,R3,#1 \\ n
\\ T B LLOOP \\ n
借:\\ n
:= R(SRC),= R(DST),= R(numPixels),= R(强度)
:0(SRC),1(DST),2(numPixels),3(强度)
:CC,R3,R4);
}
更新:
及这里的NEON版本:
无效brighten_neon(无符号字符* SRC,无符号字符* DST,诠释numPixels,诠释强度){
ASM挥发性(
\\ t MOV R4,#0 \\ n
\\ t vdup.8 D1,%3 \\ n
Lloop2:\\ n
\\ t CMP R4,%2 \\ n
\\ t BGE Lend2 \\ n
\\ t vld1.8 D0,[%0]!\\ n
\\ t vqadd.s8 D0,D0,D1 \\ n
\\ t vst1.8 D0,[%1]!\\ n
\\ t增加R4,R4,#8 \\ n
\\ T B Lloop2 \\ n
Lend2:\\ N
:= R(SRC),= R(DST),= R(numPixels),= R(强度)
:0(SRC),1(DST),2(numPixels),3(强度)
:CC,R4,D1,D0);
}
所以这个NEON版本将做一次8。然而,它并不检查 numPixels
是被8整除,所以你肯定会想这样做,否则事情会出问题!无论如何,它只是在向你展示什么可以做一个开始。注意相同数量的指令,但行动的 8 的数据一次像素。哦,和它有在那里的饱和,以及我假设你想。
I'm studying about inline assembly. I want to write a simple routine in iPhone under Xcode 4 LLVM 3.0 Compiler. I succeed write basic inline assembly codes.
example :
int sub(int a, int b)
{
int c;
asm ("sub %0, %1, %2" : "=r" (c) : "r" (a), "r" (b));
return c;
}
I found it in stackoverflow.com and it works very well. But, I don't know how to write code about LOOP.
I need to assembly codes like
void brighten(unsigned char* src, unsigned char* dst, int numPixels, int intensity)
{
for(int i=0; i<numPixels; i++)
{
dst[i] = src[i] + intensity;
}
}
Take a look here at the loop section - http://en.wikipedia.org/wiki/ARM_architecture
Basically you'll want something like:
void brighten(unsigned char* src, unsigned char* dst, int numPixels, int intensity) {
asm volatile (
"\t mov r3, #0\n"
"Lloop:\n"
"\t cmp r3, %2\n"
"\t bge Lend\n"
"\t ldrb r4, [%0, r3]\n"
"\t add r4, r4, %3\n"
"\t strb r4, [%1, r3]\n"
"\t add r3, r3, #1\n"
"\t b Lloop\n"
"Lend:\n"
: "=r"(src), "=r"(dst), "=r"(numPixels), "=r"(intensity)
: "0"(src), "1"(dst), "2"(numPixels), "3"(intensity)
: "cc", "r3", "r4");
}
Update:
And here's that NEON version:
void brighten_neon(unsigned char* src, unsigned char* dst, int numPixels, int intensity) {
asm volatile (
"\t mov r4, #0\n"
"\t vdup.8 d1, %3\n"
"Lloop2:\n"
"\t cmp r4, %2\n"
"\t bge Lend2\n"
"\t vld1.8 d0, [%0]!\n"
"\t vqadd.s8 d0, d0, d1\n"
"\t vst1.8 d0, [%1]!\n"
"\t add r4, r4, #8\n"
"\t b Lloop2\n"
"Lend2:\n"
: "=r"(src), "=r"(dst), "=r"(numPixels), "=r"(intensity)
: "0"(src), "1"(dst), "2"(numPixels), "3"(intensity)
: "cc", "r4", "d1", "d0");
}
So this NEON version will do 8 at a time. It does however not check that numPixels
is divisible by 8 so you'd definitely want to do that otherwise things will go wrong! Anyway, it's just a start at showing you what can be done. Notice the same number of instructions, but action on eight pixels of data at once. Oh and it's got the saturation in there as well that I assume you would want.
这篇关于如何编写内联汇编codeS关于X $ C $çLLVM循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!