本文介绍了使用MOV助记符将字符串加载/复制到MASM中的内存寄存器时,字符是否以相反顺序存储?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否使用MOV指令将字符串复制到寄存器中是否导致该字符串以相反的顺序存储.我了解到,当MASM将字符串存储到定义为单词或更高(dw和更大尺寸)的变量中时,该字符串以相反的顺序存储.当我将字符串复制到寄存器中时,会发生同样的事情吗?

I want to know if using the MOV instruction to copy a string into a register causes the string to be stored in reverse order. I learned that when MASM stores a string into a variable defined as a word or higher (dw and larger sizes) the string is stored in reverse order. Does the same thing happen when I copy a string to a register?

基于此问题(和),我假设以下内容:

Based on this questions (about the SCAS instruction and about assigning strings and characters to variables in MASM 32) I assumed the following:

  1. 当MASM将字符串加载到变量中时,它以相反的顺序加载,即字符串中的最后一个字符存储在字符串变量的最低内存地址(开头)中.这意味着要像这样分配一个变量str:str dd "abc"使MASM将字符串存储为"cba",这意味着"c"位于最低的内存地址中.
  2. 在将变量定义为str db "abc"时,MASM将str视为字符数组.尝试将数组索引与内存地址str匹配,MASM将在最低内存地址str中存储"a".
  3. 默认情况下,SCAS和MOVS指令从目标字符串(即EDI寄存器中存储的字符串)的开始(最低)地址开始执行.在执行之前,它们不会弹出"或对操作的内存地址应用后进先出"规则.
  4. MASM始终以相同的方式对待字符数组和字符串到内存寄存器.将字符数组'a','b','c'移至EAX与将"abc"移至EAX相同.
  1. When MASM loads a string into a variable, it loads it in reverse order, i.e. the last character in the string is stored in the lowest memory address (beginning) of the string variable. This means assigning a variable str like so: str dd "abc" causes MASM to store the strings as "cba", meaning "c" is in the lowest memory address.
  2. When defining a variable as str db "abc" MASM treats str as an array of characters. Trying to match the array index with the memory address of str, MASM will store "a" at the lowest memory address of str.
  3. By default, the SCAS and MOVS instructions execute from the beginning (lowest) address of the destination string, i.e. the string stored in the EDI register. They do not "pop" or apply the "last in, first out" rule to the memory addresses they operate on before executing.
  4. MASM always treats character arrays and strings to memory registers the same way. Moving the character array 'a', 'b', 'c' to EAX is the same as moving "abc" to EAX.

当我使用MOVSD将带有字符'a','b'和'c'的字节数组arLetters转移到双字变量strLetters时,我相信字母被复制到相反,即存储为"cba".当我使用mov eax, "abc"时,字母是否也以相反的顺序存储?

When I transfer a byte array arLetters with the characters 'a', 'b', and 'c' to the double-word variable strLetters using MOVSD, I believe the letters are copied to strLetters in reverse, i.e. stored as "cba". When I use mov eax, "abc" are the letters also stored in reverse order?

下面的代码将在退出之前将零标志设置为

The code below will set the zero flag before it exits.

.data?
strLetters dd ?,0

.data
arLetters db "abcd"

.code

start:
mov ecx, 4
lea esi, arLetters
lea edi, strLetters
movsd
;This stores the string "dcba" into strLetters.

mov ecx, 4
lea edi, strLetters
mov eax, "dcba" 
repnz scasd
jz close
jmp printer
;strLetters is not popped as "abcd" and is compared as "dcba".

printer:
print "No match.",13,10,0
jmp close

close:
push 0
call ExitProcess

end start

我希望字符串"dcba"按原样"存储在EAX中-EAX的最低内存地址中带有"d"-因为MASM对待将字符串移动到寄存器而不是将字符串分配给变量. MASM将"a","b","c","d""作为"dcba"复制到strLetter中,以确保如果弹出strLetters,则以正确的顺序("abcd")发射/释放字符串.使用c10>指令代替MOVSD,strLetters会包含"abcd",并会弹出/显示为"dcba".但是,因为使用了MOVSD,并且SCAS或MOVS指令在执行前不会弹出字符串,上面的代码应该设置零标志,对吧?

I expect the string "dcba" to be stored in EAX "as is" - with 'd' in the lowest memory address of EAX - since MASM treats moving strings to registers different from assigning strings to variables. MASM copied 'a', 'b', 'c' 'd'" into strLetters as "dcba" to ensure that if strLetters was popped, the string is emmitted/released in the correct order ("abcd"). If the REP MOVSB instruction were used in place of MOVSD, strLetters would have contained "abcd" and would be popped/emmitted as "dcba". However, becasuse MOVSD was used and SCAS or MOVS instructions do not pop strings before executing, the code above should set the zero flag, right?

推荐答案

在MASM需要16位或更大整数的上下文中,请勿使用字符串.当存储在内存中时,MASM会将它们转换为整数,从而颠倒字符的顺序.由于这很令人困惑,因此最好避免这种情况,并且只能将字符串与DB指令一起使用,这可以按预期工作.不要将超过字符的字符串用作立即值.

Don't use strings in contexts where MASM expects a 16-bit or larger integer. MASM will convert them to integers in a way that reverses the order of characters when stored in memory. Since this is confusing it's best to avoid this, and only use strings with the DB directive, which works as expected. Don't use strings with more than character as immediate values.

寄存器没有地址,谈论寄存器中的字节顺序毫无意义.在32位x86 CPU上,通用寄存器(如EAX)保存32位整数值.您可以从概念上将32位值划分为4个字节,但是,尽管它存在于寄存器中,但这些字节没有有意义的顺序.

Registers don't have addresses, and it's meaningless to talk about the order of bytes within a register. On a 32-bit x86 CPU, the general purpose registers like EAX hold 32-bit integer values. You can divide a 32-bit value conceptually into 4 bytes, but while it lives in a register there is no meaningful order to the bytes.

仅当内存中存在32位值时,组成它们的4个字节才具有地址,并因此具有顺序.由于x86 CPU使用小尾数字节顺序,表示这4个字节中的最低有效字节bytes是第一个字节.最重要的部分成为最后一个字节.每当x86在内存中加载或存储16位或更宽的值时,它都会使用little-endian字节顺序. (MOVBE指令是一个例外,它在加载和存储值时专门使用big-endian字节顺序.)

It's only when 32-bit values exist in memory do the 4 bytes that make them up have addresses and so have an order. Since x86 CPUs use the little-endian byte order that means the least-significant byte of the 4 bytes is the first byte. The most-significant part becomes the last byte. Whenever the x86 loads or stores a 16-bit or wider value to or from memory it uses the little-endian byte order. (An exception is the MOVBE instruction which specifically uses the big-endian byte order when loading and storing values.)

    .MODEL flat

    .DATA
db_str  DB  "abcd"
dd_str  DD  "abcd"
num DD  1684234849

    .CODE
_start: 
    mov eax, "abcd"
    mov ebx, DWORD PTR [db_str]
    mov ecx, DWORD PTR [dd_str]
    mov edx, 1684234849
    mov esi, [num]
    int 3

    END _start

在组装和链接之后,它会转换为类似以下的字节序列:

After assembling and linking it gets converted into sequence of bytes something like this:

.text section:
  00401000: B8 64 63 62 61 8B 1D 00 30 40 00 8B 0D 04 30 40  ,[email protected]@
  00401010: 00 BA 61 62 63 64 8B 35 08 30 40 00 CC           .º[email protected]
  ...
.data section:
  00403000: 61 62 63 64 64 63 62 61 61 62 63 64              abcddcbaabcd

(在Windows上,通常将.data部分放在内存中.text部分之后.)

(On Windows the .data section normally gets placed after the .text section in memory.)

因此,我们可以看到标记为db_strdd_str的DB和DD指令为同一字符串"abcd"生成了两个不同的字节序列.在第一种情况下,MASM生成我们希望的字节序列,分别为61h,62h,63h和64h,分别是abcd的ASCII值.对于dd_str,尽管字节顺序是相反的.这是因为DD指令使用32位整数作为操作数,因此必须将字符串转换为32位值,并且当转换结果存储在内存中时,MASM最终会反转字符串中的字符顺序.

So we can see that the DB and DD directives, the ones labelled db_str and dd_str, generates two different sequences of bytes for the same string "abcd". In the first case, the MASM generates a sequence of bytes that we would we would expect, 61h, 62h, 63h, and 64h, the ASCII values for a, b, c, and d respectively. For dd_str though the sequence of bytes is reversed. This is because the DD directive uses 32-bit integers as operands, so the string has to be converted to a 32-bit value and MASM ends up reversing the order of characters in the string when the result of the conversion gets stored in memory.

您还将注意到标有num的DD指令也生成了与DB指令相同的字节序列.确实,如果不查看源代码,就无法说出前四个字节应该是字符串,而后四个字节应该是数字.如果程序以这种方式使用它们,它们只会成为字符串或数字.

You'll also notice the DD directive labelled num also generated the same sequence of bytes that the DB directive. Indeed, without looking at the source there's no way to tell that the first four bytes are supposed to be a string while the last four bytes are supposed to be a number. They only become strings or numbers if the program uses them that way.

(不太明显的是如何将十进制值1684234849转换为与DB指令生成的序列字节相同的序列字节.它已经是32位值,只需要MASM将其转换为字节序列即可.毫不奇怪,汇编程序使用与CPU相同的小尾数字节顺序进行操作.这意味着第一个字节是1684234849的最低有效部分,恰好具有与ASCII字母a相同的值(1684234849%256 = 97 = 61h).最后一个字节是数字的最高有效部分,恰好是d的ASCII值(1684234849/256/256/256 = 100 = 64h).

(Less obvious is how the decimal value 1684234849 was converted into the same sequence bytes as generated by the DB directive. It's already a 32-bit value, it just needs to be converted into a sequence of bytes by MASM. Unsurprisingly, the assembler does so using the same little-endian byte order that the CPU uses. That means the first byte is the least significant part of 1684234849 which happens to have the same value as the ASCII letter a (1684234849 % 256 = 97 = 61h). The last byte is the most significant part of the number, which happens to be the ASCII value of d (1684234849 / 256 / 256 / 256 = 100 = 64h).)

使用反汇编器更仔细地查看.text部分中的值,我们可以看到存储在其中的字节序列在由CPU执行时将如何解释为指令:

Looking the the values in the .text section more closely with a disassembler, we can see how the sequence of bytes stored there will interpreted as instructions when executed by the CPU:

  00401000: B8 64 63 62 61     mov         eax,61626364h
  00401005: 8B 1D 00 30 40 00  mov         ebx,dword ptr ds:[00403000h]
  0040100B: 8B 0D 04 30 40 00  mov         ecx,dword ptr ds:[00403004h]
  00401011: BA 61 62 63 64     mov         edx,64636261h
  00401016: 8B 35 08 30 40 00  mov         esi,dword ptr ds:[00403008h]
  0040101C: CC                 int         3

在这里我们可以看到,MASM以与dd_str DD指令相同的顺序将组成立即值的字节存储在指令mov eax, "abcd"中.指令的紧接部分的第一个字节在内存中为64h,即ASCII值d.原因是因为此MOV指令在32位目标寄存器中使用32位立即数.这意味着MASM需要将字符串转换为32位整数,并最终像dd_str那样反转字节顺序. MASM还使用与使用相同数字的DD指令相同的方式来处理作为mov ecx, 1684234849立即数的十进制数字.该32位值已转换为相同的little-endian表示形式.

What we can see here is that that MASM stored the bytes that make up the immediate value in the instruction mov eax, "abcd" in the same order it did with the dd_str DD directive. The first byte of the immediate part of the instruction in memory is 64h, the ASCII value of d. The reason why is because the with a 32-bit destination register this MOV instruction uses a 32-bit immediate. That means that MASM needs to convert the string to a 32-bit integer and ends up reversing the order of bytes as it did with dd_str. MASM also handles the decimal number given as the immediate to the mov ecx, 1684234849 the same way it did with the DD directive that used the same number. The 32-bit value was converted to same little-endian representation.

您还将注意到,反汇编程序生成的汇编指令将十六进制值用作这两个指令的立即数.与CPU一样,汇编器也无法知道立即数应为字符串和十进制数.它们只是程序中的一个字节序列,它所知道的只是它们是32位立即值(来自操作码B8h和B9h),因此由于缺少更好的选择,因此将它们显示为32位十六进制值.

You'll also notice that the disassembler generated assembly instructions that use hexadecimal values for the immediates of these two instruction. Like the CPU, the assembler has no way of knowing that immediate values are supposed be strings and decimal numbers. They're just a sequence of bytes in the program, all it knows is that they're 32-bit immediate values (from the opcodes B8h and B9h) and so displays them as 32-bit hexadecimal values for the lack of any better alternative.

通过在调试器下执行程序并在到达断点指令(int 3)后检查寄存器,我们可以看到寄存器中实际结束的内容:

By executing the program under a debugger and inspecting the registers after it reaches the breakpoint instruction (int 3) we can see what actually ended up in the registers:

eax=61626364 ebx=64636261 ecx=61626364 edx=64636261 esi=64636261 edi=00000000
eip=0040101c esp=0018ff8c ebp=0018ff94 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
image00000000_00400000+0x101c:
0040101c cc              int     3

现在,我们可以看到第一条指令和第三条指令加载的值与其他指令不同.这两条指令都涉及MASM将字符串转换为32位值并最终使内存中的字符反转顺序的情况.寄存器转储确认了存储器中字节的颠倒顺序导致将不同的值加载到寄存器中.

Now we can see that the first and third instructions loaded a different value than the other instructions. These two instruction both involve cases where MASM converted the string to a 32-bit value and ended up reversing order of the characters in memory. The register dump confirms that reversed order of bytes in memory in memory results in different values being loaded into the registers.

现在,您可能正在看上面的寄存器转储,并认为只有EAX和ECX处于正确的顺序,其中a的ASCII值首先为61h,而d的ASCII值最后为64h. MASM颠倒了内存中字符串的顺序,实际上导致它们以正确的顺序加载到寄存器中.但是正如我之前所说,寄存器中没有字节顺序.数字61626364就是调试器将其显示为可以读取的字符序列时表示值的方式.字符61在调试器的表示中排在第一位,因为我们的编号系统将数字的最重要部分放在左侧,并且我们从左到右阅读,因此使其成为第一部分.但是,正如我之前所说的,x86 CPU是低位优先的,这意味着最不重要的部分在内存中排在第一位.这意味着内存中的第一个字节成为寄存器中值的最低有效部分,调试器将其显示为数字的最右边两个十六进制数字,因为这是数字在编号系统中所占的最低有效部分.

Now you might be looking at that register dump above and thinking that only EAX and ECX is in the correct order, with the ASCII value for a, 61h first and and the ASCII value for d, 64h last. That MASM reversing the order of the strings in memory actually caused them to be loaded into registers in the correct order. But as I said before, there's no byte order in registers. The number 61626364 is just how the debugger represents the value when displaying it as a sequence of characters you can read. The characters 61 come first in the debugger's representation because our numbering system puts the most significant part of the number on the left, and we read left-to-right so that makes it the first part. However, as I also said before, x86 CPUs are little-endian, which means the least significant part comes first in memory. That means the first byte in memory becomes the least significant part of the value in the register, which gets displayed as the rightmost two hexadecimal digits of the number by the debugger because that's where least significant part the number goes in our numbering system.

换句话说,因为x86 CPU是低位优先的,所以最低有效位在前,但是我们的编号系统是高位优先的,首先是最高有效位,所以十六进制数字以与它们实际存储位置相反的字节顺序显示.记忆.

In other words because x86 CPUs are little-endian, least significant first, but our numbering system is big-endian, most significant first, hexadecimal numbers get displayed in a byte-wise reverse order to how they're actually stored in memory.

现在应该也应该很清楚,将字符串加载到寄存器中只是概念上的事情.汇编器将字符串转换为字节序列,将其装入32位寄存器后,将其视为内存中的低位32位整数.当寄存器中的32位值存储在内存中时,该32位值将转换为字节序列,该字节序列以little-endian格式表示该值.对于CPU,您的字符串只是一个32位整数,它从内存中加载并存储到内存中.

It should also be hopefully clear by now that loading a string into a register is only something that happens conceptually. The string gets converted into a sequence of bytes by the assembler, which when loaded into a 32-bit register, gets treated as little-endian 32-bit integer in memory. When the 32-bit value in the register is stored in memory the 32-bit value is converted into a sequence of bytes that represent the value in little-endian format. To the CPU your string is just a 32-bit integer it loaded and stored to and from memory.

因此,这意味着如果在示例程序中加载到EAX中的值使用诸如mov [mem], eax之类的内容存储到内存中,则存储在mem中的4个字节将与它们出现在字节中的顺序相同.组成mov eax, "abcd"的立即数.这与MASM将它们放在立即组成的字节中的顺序相反,即64h,63h,62h,61h.

So that means that if the value loaded into EAX in the sample program is stored to memory with something like mov [mem], eax then the the 4 bytes stored at mem will be in the same order as they appeared in the bytes that made up the immediate of mov eax, "abcd". That is in the same reversed order, 64h, 63h, 62h, 61h, that MASM put them in the bytes that make up immediate.

现在,为什么我不知道为什么MASM在将字符串转换为32位整数时会反转字符串的顺序,但是这里的道义是不要将字符串用作立即数或任何其他需要转换为字符串的上下文整数.汇编程序在如何将字符串文字转换为整数方面不一致. (C编译器如何将像'abcd'这样的字符文字转换为整数也发生类似的问题.)

Now as to why MASM is reversing the order of strings when converting them to 32-bit integers I don't know, but the moral here is not to use strings as immediates or any other context where they need to be converted to integers. Assemblers are inconsistent on how they convert string literals into integers. (A similar problem occurs in how C compilers convert character literals like 'abcd' into integers.)

SCASD或MOVSD仪器没有什么特别的事情. SCASD将EDI指向的四个字节视为一个32位的Little-Endian值,将其加载到一个未命名的临时寄存器中,将该临时寄存器与EAX进行比较,然后根据DF标志从EDI中添加或减去4. MOVSD将ESI指向的内存中的32位值加载到一个未命名的临时寄存器中,将临时寄存器存储到EDI指向的32位内存位置中,然后根据DF标志更新ESI和EDI. (字节顺序对于MOVSD无关紧要,因为字节从未用作32位值,但顺序没有改变.)

Nothing special happens with the SCASD or MOVSD instrucitons. SCASD treats the four bytes pointed to by EDI as a 32-bit little-endian value, loads it into an unnamed temporary register, compares the temporary register to EAX, and then adds or subtracts 4 from EDI depending on the DF flag. MOVSD loads a 32-bit value in memory pointed to by ESI into an unnamed temporary register, stores the temporary register the 32-bit memory location pointed to by EDI, and then updates ESI and EDI according to the DF flag. (Byte order doesn't matter for MOVSD as the bytes are never used as a 32-bit value, but the order isn't changed.)

我不会尝试将SCASD或MOVSD视为FIFO或LIFO,因为最终这取决于您如何使用它们. MOVSD可以像LIFO堆栈一样容易地用作FIFO队列实现的一部分. (将其与PUSH和POP进行比较,从理论上讲,它们可以独立地用作FIFO或LIFO数据结构的实现的一部分,但一起只能用于实现LIFO堆栈.)

I wouldn't try to think of SCASD or MOVSD as FIFO or LIFO because ultimately that depends on how you use them. MOVSD can just as easily be used as part of an implementation of FIFO queue as a LIFO stack. (Compare this to PUSH and POP, which in theory could independently be used part of an implementation of either a FIFO or LIFO data structure, but together can only be used to implement a LIFO stack.)

这篇关于使用MOV助记符将字符串加载/复制到MASM中的内存寄存器时,字符是否以相反顺序存储?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 16:13