本文介绍了SCAS和MOVS指令如何受到EFLAG方向值的影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何设置或清除EFLAG方向会如何改变SCAS和MOV指令递减或递增寄存器.我阅读了一些网页,并做了以下假设,我将在下面列出.

我正在使用MASM 32 SDK(不知道是哪个版本,我是通过Visual MASM的下载和安装向导安装的),使用Visual MASM进行安装,并使用MASM32 Editor进行链接并将其构建为对象和可执行文件.我使用的是Windows 7 Pro 64位操作系统.

SCAS

  1. SCAS指令将AL中的字节或AX中的字与ES中DI指向的字节或字进行比较".因此,要使用SCAS,必须将目标字符串地址移至EDI,并将要查找的字符串移至累加器寄存器(EAX和变量).

  2. 设置方向标志然后使用SCAS将在使用32位系统时停止SCAS的运行.在32位系统上,不可能强制SCAS从头到尾扫描字符串".

  3. 任何REP指令始终将ECX寄存器用作计数器,并且始终减小ECX,而与方向标志的值无关.这意味着不可能使用REP SCAS从头到尾扫描字符串".

来源:
SCAS/SCASB/SCASW,Birla理工学院和科学
扫描来自c9xm.me的字符串
SCAS/SCASB/SCASW/SCASD —扫描字符串,来自felixcloutier.com
MASM:使用字符串"指令(来自www .dreamincode.net/forums

以下是我将在我的问题中引用的程序代码的一部分:

;Generic settings from MASM32 editor
.386
.model flat, stdcall
option casemap: none

.data?
Input db 254 dup(?)
InputCopy db 254 dup(?)
InputLength dd ?, 0
InputEnd dd ?, 0

.data

.code

start:
push 254
push offset Input
call StdIn
mov InputLength, eax

;---Move Last Word---
lea esi, offset Input
sub esi, 4
lea edi, offset InputEnd
movw

;---Search section---
lea esi, Input
lea edi, InputCopy
movsb

mov ecx, InputLength
mov eax, 0
mov eax, "omit"

lea edi, offset InputEnd
repne scasw
jz close ;jump if a match was found and ZF was set to 1.
  1. 搜索"部分下的代码一次搜索字符串InputEnd 4个字节,因此一次搜索4个字符.该块扫描EAX中的字符,即单词"omit",始终从edi中的存储器地址的值开始,然后根据SCAS的后缀(B,W,D,Q)递增(MASM:使用字符串"指令,dream-in-code.com).

MOVS

  1. 使用移动最后一个单词"部分,我可以从字符串Input中获取最后一个字节.然后,我使用MOVSW将字符串Input的最后4个字节移到InputEnd,假设方向标志清晰可见.我必须将Input定义为字节数组-Input db 32 dup(?)-,以便该块正常工作.

  2. 无论我如何定义InputEnd(无论是"dd?,0"还是"db 12 dup(?)"),mov和scas指令的操作(设置标志,修改寄存器等)都不会改变. SCAS和MOV的增减量取决于命令的后缀/最后字母,而不是存储在EDI和ESI中的指针的已定义字节或大小.

  3. 不可能使MOVS从字符串的开头到结尾进行传输.您必须输入字符串的长度;将相应的地址加载到EDI和ESI;将字符串的长度添加到存储在EDI和ESI的地址中;最后,使用std设置方向标志.此处的危险是将地址定位在源或目标字节以下.

  4. 使用MOVS不可能反转字符串的字母,因为EDI和ESI要么被MOVS 减小了,要么又被增大了.

来源(来自SCAS部分先前列出的网站):
https://c9x.me/x86/html/file_module_x86_id_203.html
http://faydoc.tripod.com/cpu/movsd.htm

这些假设正确吗?网站URL上的x86文本是否表示网站信息错误?

解决方案

首先,repe/repne scascmps并不快.另外,rep movsrep stos 快速字符串" /ERMSB微码仅在DF = 0 (正常/转发/地址递增)时才快速. >DF = 1的

rep movs较慢. repne scasw总是 慢.但是,在极少数情况下,您需要针对代码大小进行优化时,它们会很有用.


您链接的文档完全阐明了DF对movsscas的影响. 阅读英特尔手册中的操作"部分.

请注意,它始终是后递增/递减的,因此比较的第一个元素并不取决于DF,而仅取决于对EDI和/或ESI的更新.

您的代码仅取决于repne scasw的DF. movsb递增(DF = 0)还是递减(DF = 1)EDI都没关系,因为您在下次使用前会覆盖EDI.


repne scasw是使用AX的16位单词"大小,就像您链接的英特尔手册的HTML摘录中所说的那样( https://www.felixcloutier.com/x86/scas:scasb:scasw:scasd ).这就是增量比较宽度.

如果要重叠EAX的双字比较,则不能使用scasw.

可以循环使用scasd,但是您必须递减edi来创建重叠.因此,实际上,如果您只想检查偶数位置,则应该只使用普通的cmp [edi], eaxadd edi, 2.

(或者最好使用SSE2 SIMD pcmpeqd为4字节搜索"needle"实现memmem.查看优化的实现,例如glibc的想法,或strstr的实现,但请检查干草堆"中的终止符.)

repne scasd不会实现strstr或memmem ,它仅搜索单个元素.使用byte操作数大小,它实现memchr.


rep scas根本不对(隐含长度的)C风格的字符串进行操作;它适用于显式长度的字符串.因此,您只需将EDI指向缓冲区的最后一个元素即可.

strrchr不同,您不必查找字符串的末尾以及最后一个匹配项,您知道/即可计算出字符串的末尾字符串是.也许称它们为弦"是个问题. x86 rep -string指令实际上在已知大小的缓冲区上起作用.这就是为什么他们在ECX中进行计数,并且也不要在终止的0字节上停下来.

使用lea edi, [buf + ecx - 1]设置stdrep scasb.或lea edi, [buf + ecx*2 - 2]在带有ECX word元素的缓冲区上设置为向后rep scasw. (生成指向最后一个元素的指针= buf + size - 1 = buf-1 + size)

这只是零意义.当然,它会递减; ECX = 0是搜索如何以不匹配结束的方式.如果要从末端搜索后计算相对于末端的位置,可以执行length - ecx或类似的方法.或在EDI上进行指针减法.

汇编语言没有类型;这是一个更高层次的概念.由您决定对asm中正确的字节执行正确的操作. EDI/ESI are 寄存器;存储在其中的指针只是在asm中没有类型的整数.您不是在EDI中存储寄存器",而是.也许您是想说"EDI中的指针存储"?寄存器没有类型;寄存器中的位模式(aka整数)可以用2的补码,无符号,指针或任何其他想要的解释进行签名.

但是,是的,一旦在寄存器中有了指针,MASM基于符号定义方式所做的任何魔术都将完全消失.

请记住,movsd只是x86机器代码中的1字节指令,只是操作码.它只有3个输入:DF,以及EDI和ESI中的两个32位整数,并且它们都是隐式的(由操作码字节表示).没有其他上下文可以影响硬件的功能.每条机器指令对机器的体系结构状态都有其文件化的影响.仅此而已.

否,std使传输从头到尾倒退. DF=0是法线/正向.调用约定保证/要求任何函数进入和退出时DF = 0,因此在使用字符串指令之前不需要cld.您可以假设DF = 0. (并且您通常应该离开DF = 0.)

是正确的.与在其中一个指针上使用decsub的普通循环相比,lods/std/stos/cld循环不值得.您可以将lods用于读取的部分并手动向后写入.通过加载dword并使用bswap将其在寄存器中反转,您可以快4倍,因此,您将以4个反转字节的块进行复制.

或者就地反转:将2个加载到tmp reg中,然后存储2个,然后将指针彼此相对移动,直到它们交叉. (还可以与bswapmovbe一起使用)


您的代码中其他奇怪的低效率:

    mov eax, 0                ;; completely pointless, EAX is overwritten by next instruction
    mov eax, "omit"

此外,具有disp32寻址模式的lea浪费了代码大小.仅将LEA用于64位代码中的静态地址,用于相对于RIP的寻址.就像您之前使用push offset Input一样,使用mov esi, OFFSET Input.

I want to know how setting or clearing the direction EFLAG changes how the SCAS and MOV instructions decrement or increment registers. I read some webpages and made the following assumptions I will list below.

I am using the MASM 32 SDK - no idea what version, I installed via Visual MASM's download and installation wizard - with Visual MASM to wright and MASM32 Editor to link and build them into objects and executables. I use a Windows 7 Pro 64 bit OS.

SCAS

  1. The SCAS instruction "compares a byte in AL or a word in AX with a byte or word pointed to by DI in ES." Therefore,to use SCAS, target string address must be moved to EDI and the string to find must be moved to the accumulator register (EAX and variants).

  2. Setting direction flag then using SCAS will stop SCAS from running when using 32 bit systems. On 32 bit systems, it is impossible to force SCAS to "scan a string from the end to the start."

  3. Any REP instruction always uses the ECX register as a counter and always decrements ECX regardless of the direction flag's value. This means it is impossible to "scan a string from the end to the beginning" using REP SCAS.

Sources:
SCAS/SCASB/SCASW, Birla Institute of Technology and Science
Scan String, from c9xm.me
SCAS/SCASB/SCASW/SCASD — Scan String, from felixcloutier.com
MASM : Using 'String' Instructions, from www.dreamincode.net/forums

Below is part of the code from a program I will refer to in my questions:

;Generic settings from MASM32 editor
.386
.model flat, stdcall
option casemap: none

.data?
Input db 254 dup(?)
InputCopy db 254 dup(?)
InputLength dd ?, 0
InputEnd dd ?, 0

.data

.code

start:
push 254
push offset Input
call StdIn
mov InputLength, eax

;---Move Last Word---
lea esi, offset Input
sub esi, 4
lea edi, offset InputEnd
movw

;---Search section---
lea esi, Input
lea edi, InputCopy
movsb

mov ecx, InputLength
mov eax, 0
mov eax, "omit"

lea edi, offset InputEnd
repne scasw
jz close ;jump if a match was found and ZF was set to 1.
  1. The code under the "Search" section searches the string InputEnd 4 bytes at a time and thus 4 characters at a time. The block scans for the characters in EAX, i.e. the word "omit", ALWAYS beginning at the value of the memory address in edi then incrementing based on the suffix of SCAS (B, W, D, Q)(MASM : Using 'String' Instructions, dream-in-code.com).

MOVS

  1. Using the "Move Last Word" section, I am able to get the last byte out of the string Input. I then used MOVSW to move just the last 4 bytes of the string Input to InputEnd, assuming the direction flag is clear. I must define Input as an array of bytes - Input db 32 dup(?) - for the block to work.

  2. Regardless of how I define InputEnd (whether "dd ?, 0" or "db 12 dup(?)") mov and scas instructions' operation (flags set, registers modified etc.) will not change. The increment/decrement amount of SCAS and MOV are dependent on the suffix/last letter of the command, not the defined bytes or size of the pointers stored in EDI and ESI.

  3. It is impossible to make MOVS transfer from the beginning to the end of a string. You must the length of the string; load the corresponding addresses to EDI and ESI; Add the length of the string to the addresses stored at EDI and ESI; Last, set the direction flag using std. A danger here is targeting addresses below the source or destination bytes.

  4. It is impossible to reverse a string's letters using MOVS since EDI and ESI are either both decremented or both incremented by MOVS.

Sources (asides from previously listed sites in SCAS section):
https://c9x.me/x86/html/file_module_x86_id_203.html
http://faydoc.tripod.com/cpu/movsd.htm

Are these assumptions correct?Is the x86 text on the sites' URLs a sign that the websites have wrong information?

解决方案

First of all, repe/repne scas and cmps aren't fast. Also, the "fast strings" / ERMSB microcode for rep movs and rep stos is only fast with DF=0 (normal / forward / increasing address).

rep movs with DF=1 is slow. repne scasw is always slow. They can be useful in the rare case where you're optimizing for code-size, though.


The documentation you linked sets out exactly how movs and scas are affected by DF. Read the Operation section in Intel's manuals.

Note that it's always a post-increment/decrement so the first element compared doesn't depend on DF, only the updates to EDI and/or ESI.

Your code only depends on DF for the repne scasw. It doesn't matter whether movsb increments (DF=0) or decrements (DF=1) EDI because you overwrite EDI before the next use.


repne scasw is 16-bit "word" size using AX, like it says in the HTML extracts of Intel's manual that you linked (https://www.felixcloutier.com/x86/scas:scasb:scasw:scasd). That's both the increment and the compare width.

If you want overlapping dword compares of EAX, you can't use scasw.

You could use scasd in a loop, but then you'd have to decrement edi to create overlap. So really you should just use a normal cmp [edi], eax and add edi, 2 if you only want to check even positions.

(Or preferably use SSE2 SIMD pcmpeqd to implement memmem for a 4-byte search "needle". Look at an optimized implementation like glibc's for ideas, or a strstr implementation but take out the checks for a 0 terminator in the "haystack".)

repne scasd does not implement strstr or memmem, it only searches for a single element. With byte operand size, it implements memchr.


rep scas doesn't operate on (implicit-length) C-style strings at all; it works on explicit-length strings. Therefore you can just point EDI at the last element of the buffer.

Unlike strrchr you don't have to find the end of the string as well as the last match, you know / can calculate where the end of the string is. Perhaps calling them "strings" is the problem; the x86 rep-string instructions actually work on known-size buffers. That's why they take a count in ECX and don't also stop on a terminating 0 byte.

Use lea edi, [buf + ecx - 1] to set up for std ; rep scasb. Or lea edi, [buf + ecx*2 - 2] to set up for backwards rep scasw on a buffer with ECX word elements. (Generate a pointer to the last element = buf + size - 1 = buf-1 + size)

This just makes zero sense. Of course it decrements; ECX=0 is how the search ends on no-match. If want to calculate position relative to the end after searching from the end, you can do length - ecx or something like that. Or do pointer-subtraction on EDI.

Assembly language doesn't have types; that's a higher level concept. It's up to you to do the right thing to the right bytes in asm. EDI / ESI are registers; the pointers stored in them are just integers that have no type in asm. You don't "store a register in EDI", it is a register. Maybe you meant to say "pointer store in EDI"? Registers don't have types; a bit-pattern (aka integer) in a register can be signed 2's complement, unsigned, a pointer, or whatever other interpretation you want.

But yes, any magic that MASM does based on how you defined a symbol is completely gone once you have a pointer in a register.

Remember that movsd is just a 1-byte instruction in x86 machine code, just the opcode. It has only 3 inputs: DF, and two 32-bit integers in EDI and ESI, and they're all implicit (implied by the opcode byte). There's no other context that can affect what the hardware does. Every machine instruction has its documented effect on the architectural state of the machine; nothing more, nothing less.

No, std makes a transfer go backwards, from end to beginning. DF=0 is the normal / forward direction. Calling conventions guarantee / require that DF=0 on entry and exit from any function so you don't need a cld before using string instructions; you can just assume that DF=0. (And you should normally leave DF=0.)

That's correct. And a lods / std / stos / cld loop is not worth it vs. a normal loop that uses dec or sub on one of the pointers. You can use lods for the read part and manually write backwards. And you can go 4x faster by loading a dword and using bswap to reverse it in a register, so you're copying in chunks of 4 reversed bytes.

Or for in-place reversal: 2 loads into tmp regs, then 2 stores, then moves the pointers towards each other until they cross. (Also works with bswap or movbe)


Other weird inefficiencies in your code:

    mov eax, 0                ;; completely pointless, EAX is overwritten by next instruction
    mov eax, "omit"

Also, lea with a disp32 addressing mode is a pointless waste of code-size. Only use LEA for static addresses in 64-bit code, for RIP-relative addressing. Use mov esi, OFFSET Input instead, like you're doing with push offset Input earlier.

这篇关于SCAS和MOVS指令如何受到EFLAG方向值的影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-12 10:54