为什么 x86-64 Linux 系统调用会修改 RCX，该值是什么意思?

本文介绍了为什么 x86-64 Linux 系统调用会修改 RCX，该值是什么意思?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 sys_brk 系统调用在 linux 中分配一些内存.这是我尝试过的:

I'm trying to allocate some memory in linux with sys_brk syscall. Here is what I tried:

BYTES_TO_ALLOCATE equ 0x08

section .text
    global _start

_start:
    mov rax, 12
    mov rdi, BYTES_TO_ALLOCATE
    syscall

    mov rax, 60
    syscall

事情是按照 linux 调用约定我期望返回值在 rax 寄存器(指向分配的内存的指针)中.我在 gdb 中运行了这个，在进行 sys_brk 系统调用后，我注意到以下寄存器内容

The thing is as per linux calling convention I expected the return value to be in rax register (pointer to the allocated memory). I ran this in gdb and after making sys_brk syscall I noticed the following register contents

在系统调用之前

rax            0xc      12
rbx            0x0      0
rcx            0x0      0
rdx            0x0      0
rsi            0x0      0
rdi            0x8      8

系统调用后

rax            0x401000 4198400
rbx            0x0      0
rcx            0x40008c 4194444 ; <---- What does this value mean?
rdx            0x0      0
rsi            0x0      0
rdi            0x8      8

在这种情况下，我不太明白 rcx 寄存器中的值.将哪一个用作指向我用 sys_brk 分配的 8 个字节的开头的指针?

I do not quite understand the value in the rcx register in this case. Which one to use as a pointer to the beginning of 8 bytes I allocated with sys_brk?

推荐答案

系统调用返回值在 rax 中，一如既往.请参阅什么是 UNIX & 的调用约定Linux 系统调用 i386 和 x86-64.

The system call return value is in rax, as always. See What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.

请注意，sys_brk 与 brk/sbrk POSIX 函数的接口略有不同；请参阅 Linux brk(2) 的 C 库/内核差异部分) 手册页.具体来说，Linux sys_brk 设置程序中断；arg 和返回值都是指针.请参阅汇编 x86 brk() 调用使用.这个答案需要点赞，因为它是该问题唯一的好答案.

Note that sys_brk has a slightly different interface than the brk / sbrk POSIX functions; see the C library/kernel differences section of the Linux brk(2) man page. Specifically, Linux sys_brk sets the program break; the arg and return value are both pointers. See Assembly x86 brk() call use. That answer needs upvotes because it's the only good one on that question.

您问题的另一个有趣部分是:

The other interesting part of your question is:

我不太明白这种情况下rcx寄存器中的值

您看到了 系统调用/sysret 指令旨在允许内核恢复用户空间执行但仍然很快.

You're seeing the mechanics of how the syscall / sysret instructions are designed to allow the kernel to resume user-space execution but still be fast.

syscall 不做任何加载或存储，它只修改寄存器.它不使用特殊寄存器来保存返回地址，而是使用常规整数寄存器.

syscall doesn't do any loads or stores, it only modifies registers. Instead of using special registers to save a return address, it simply uses regular integer registers.

RCX=RIP 和 R11=RFLAGS 在内核返回到您的用户空间代码后并非巧合.不是的唯一方法是ptrace系统调用修改了进程保存的rcx或r11 内核中的值.(ptrace 是 gdb 使用的系统调用).在这种情况下，Linux 将使用 iret 而不是 sysret 返回用户空间，因为较慢的一般情况 iret 可以做到这一点.(见如果您在 64 位代码中使用 32 位 int 0x80 Linux ABI 会发生什么? 一些 Linux 系统调用入口点的演练.主要是 32 位的入口点进程，但不是来自 64 位进程中的 syscall.)

It's not a coincidence that RCX=RIP and R11=RFLAGS after the kernel returns to your user-space code. The only way for this not to be the case is if a ptrace system call modified the process's saved rcx or r11 value while it was inside the kernel. (ptrace is the system call gdb uses). In that case, Linux would use iret instead of sysret to return to user space, because the slower general-case iret can do that. (See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for some walk-through of Linux's system-call entry points. Mostly the entry points from 32-bit processes, not from syscall in a 64-bit process, though.)

syscall，而不是将返回地址推入内核堆栈(如 int 0x80 所做的那样):

设置 RCX=RIP，R11=RFLAGS(因此在您执行 syscall 之前，内核甚至不可能看到这些 regs 的原始值).

sets RCX=RIP, R11=RFLAGS (so it's impossible for the kernel to even see the original values of those regs before you executed syscall).

使用来自配置寄存器(IA32_FMASK MSR)的预配置掩码来掩码 RFLAGS.这让内核禁用中断 (IF)，直到它完成 swapgs 并设置 rsp 指向内核堆栈.即使将 cli 作为入口点的第一条指令，也会存在漏洞窗口.您还可以通过屏蔽 DF 免费获得 cld 所以 rep movs/stos 即使用户空间也向上曾经使用过std.

masks RFLAGS with a pre-configured mask from a config register (the IA32_FMASK MSR). This lets the kernel disable interrupts (IF) until it's done swapgs and setting rsp to point to the kernel stack. Even with cli as the first instruction at the entry point, there'd be a window of vulnerability. You also get cld for free by masking off DF so rep movs / stos go upward even if user-space had used std.

有趣的事实:AMD 首次提出的 syscall/swapgs 设计没有屏蔽 RFLAGS，但是他们在 amd64 邮件列表上的内核开发人员反馈后更改了它(大约在 2000 年，比第一块硅片早几年).

Fun fact: AMD's first proposed syscall / swapgs design didn't mask RFLAGS, but they changed it after feedback from kernel developers on the amd64 mailing list (in ~2000, a couple years before the first silicon).

跳转到配置的 syscall 入口点(设置 CS:RIP = IA32_LSTAR).我认为旧的 CS 值不会保存在任何地方.

jumps to the configured syscall entry point (setting CS:RIP = IA32_LSTAR). The old CS value isn't saved anywhere, I think.

它不做任何其他事情，内核必须使用 swapgs 来访问它保存内核堆栈指针的信息块，因为 rsp 在用户空间仍然有它的价值.

It doesn't do anything else, the kernel has to use swapgs to get access to an info block where it saved the kernel stack pointer, because rsp still has its value from user-space.

所以 syscall 的设计需要一个破坏寄存器的系统调用 ABI，这就是为什么值就是它们的原因.

So the design of syscall requires a system-call ABI that clobbers registers, and that's why the values are what they are.

这篇关于为什么 x86-64 Linux 系统调用会修改 RCX，该值是什么意思?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！