c++ - C/C++内联asm操作数类型不正确

我有以下代码，它应该是异或内存块：

void XorBlock(DWORD dwStartAddress, DWORD dwSize, DWORD dwsKey)
{
DWORD dwKey;
__asm
{
    push eax
    push ecx
    mov ecx, dwStartAddress          // Move Start Address to ECX
    add ecx, dwSize                  // Add the size of the function to ECX
    mov eax, dwStartAddress          // Copy the Start Address to EAX

    crypt_loop:                         // Start of the loop
        xor byte ptr ds:[eax], dwKey     // XOR The current byte with 0x4D
        inc eax                         // Increment EAX with dwStartAddress++
        cmp eax,ecx                     // Check if every byte is XORed
    jl crypt_loop;                      // Else jump back to the start label

    pop ecx // pop ECX from stack
    pop eax // pop EAX from stack
}
}

但是，参数dwKey给了我一个错误。例如，如果将dwKey替换为0x5D，则代码可以正常工作。

最佳答案

我认为你有两个问题。
首先，“xor”不能接受两个内存操作数（ds:[eax]是一个内存位置，dwKey是一个内存位置）；其次，使用“byte ptr”表示需要一个字节，但尝试使用一个DWORD，程序集无法自动转换这些值。
因此，您可能需要将值加载到8位寄存器中，然后执行此操作。例如：

void XorBlock(DWORD dwStartAddress, DWORD dwSize, DWORD dwsKey)
{
    DWORD dwKey;
    __asm
    {
        push eax
        push ecx
        mov ecx, dwStartAddress          // Move Start Address to ECX
        add ecx, dwSize                  // Add the size of the function to ECX
        mov eax, dwStartAddress          // Copy the Start Address to EAX
        mov ebx, dwKey                   // <---- LOAD dwKey into EBX

        crypt_loop :                         // Start of the loop
            xor byte ptr ds : [eax], bl     // XOR The current byte with the low byte of EBX
            inc eax                         // Increment EAX with dwStartAddress++
            cmp eax, ecx                     // Check if every byte is XORed
            jl crypt_loop;                      // Else jump back to the start label

        pop ecx // pop ECX from stack
        pop eax // pop EAX from stack
    }
}

尽管如此，代码中的dwKey似乎也未初始化；也许您应该只是“mov bl，0x42”。我也不确定你是否需要推和弹出寄存器，我不记得用MSVC++ +内嵌汇编程序可以让你注册什么样的寄存器。
但是，最后，我认为Alan Stokes在他的评论中是正确的：在这种情况下，装配不太可能比C/C++代码快。编译器可以自己轻松地生成此代码，您可能会发现编译器实际上进行了意外的优化，使其运行速度甚至比“显而易见”的程序集更快（例如，loop unrolling）。