


I am trying to follow another SO post and implement sqrt14 within my iOS app:

double inline __declspec (naked) __fastcall sqrt14(double n)
    _asm fld qword ptr [esp+4]
    _asm fsqrt
    _asm ret 8


I have modified this to the following in my code:

double inline __declspec (naked) sqrt14(double n)
    __asm__("fld qword ptr [esp+4]");
    __asm__("ret 8");


Above, I have removed the "__fastcall" keyword from the method definition since my understanding is that it is for x86 only. The above gives the following errors for each assembly line respectively:


Unexpected token in argument list




I have attempted to read through a few inline ASM guides and other posts on how to do this, but I am generally just unfamiliar with the language. I know MIPS quite well, but these commands/registers seem to be very different. For example, I don't understand why the original author never uses the passed in "n" value anywhere in the assembly code.

Any help getting this to work would be greatly appreciated! I am trying to do this because I am building an app where I need to calculate sqrt (ok, yes, I could do a lookup table, but for right now I care a lot about precision) on every pixel of a live-video feed. I am currently using the standard sqrt, and in addition to the rest of the computation, I'm running at around 8fps. Hoping to bump that up a frame or two with this change.

If it matters: I'm building the app to ideally be compatibly with any current iOS device that can run iOS 7.1 Again, many thanks for any help.


The compiler is perfectly capable of generating fsqrt instruction, you don't need inline asm for that. You might get some extra speed if you use -ffast-math.


For completeness' sake, here is the inline asm version:

__asm__ __volatile__ ("fsqrt" : "=t" (n) : "0" (n));

The fsqrt instruction has no explicit operands, it uses the top of the stack implicitly. The =t constraint tells the compiler to expect the output on the top of the fpu stack and the 0 constraint instructs the compiler to place the input in the same place as output #0 (ie. the top of the fpu stack again).

Note that fsqrt is of course x86-only, meaning it wont work for example on ARM cpus.


