这些是最后两行的含义:使用基数为 1 且偏移量为 0x7003h 的段访问未映射的页面.这会按预期生成 #PF(访问已对齐,因此此处唯一可能的异常是 #PF).使用基数为 1 且偏移量为 0x7000h 的段访问未映射的页面.这会生成一个 #AC,因此 CPU 在尝试转换地址之前会检查对齐.第 6 点似乎表明 CPU 将对 线性地址 执行检查,因为没有完成对页表的访问.在第 6 点中,可能会生成两个异常,未生成 #PF 的事实意味着 CPU 在执行对齐检查时尚未尝试转换地址.(或者,#AC 逻辑上优先.但硬件可能不会在处理 #AC 异常之前执行页面遍历,即使它在执行基址 + 偏移量计算后确实探测了 TLB.)测试代码代码凌乱,比想象中更繁琐.主要障碍是#AC 仅在 CPL=3 下工作.所以我们需要创建CPL=3的描述符,加上一个TSS段和一个TSS描述符.为了处理异常,我们需要一个 IDT,我们还需要分页.BITS 16组织 7c00h;跳过 BPB(我的 BIOS 主动覆盖它)jmp 简短内容 __SKIP_BPB__;我仔细观察了 BPB 大小(至少是可能被覆盖的部分)时间 40h db 0__SKIP_BPB__:;设置段(包括CS)异或斧头,斧头mov ds, axmov ss, 斧头异或 sp, spjmp 0:__开始____开始__:;清除并设置视频模式(在我们切换到PM之前)mov ax, 03h10 小时;禁用中断并加载GDT和IDT命令行lgdt [GDT]利特 [IDT];启用PM移动 eax, cr0或 al, 1mov cr0, eax;写一个 TSS 段,我们将 104h DWORD 置零,只设置 SS0:ESP0 字段mov di, 7000hmov cx, 104h异或斧头,斧头代表停止mov DWORD [7004h], 7c00h ;ESP0mov WORD [7008h], 10h ;SS0;在EFLAGS中设置AC推送或 DWORD [esp], 1 >16)>16)身份映射(包含代码和所有堆栈和系统结构);8xxx ->不存在;9xxx ->映射到 VGA 文本缓冲区 (0b8xxxh);注意分页结构在 6000h 和 5000h,这是可以的,因为它们是物理地址;设置页面目录为6000hmov eax, 6000hmov cr3, eax;设置Page Directory Entry 0 (for 00000000h-00300000h)指向5000h处的页表mov DWORD [eax], 5007h;将页表条目 7(对于 00007xxxh)设置为标识映射,将页表条目 8(对于 000008xxxh)设置为不存在mov eax, 5000h + 7*4mov DWORD [eax], 7007hmov DWORD [eax+4], 8006h;将页面 9000h 映射到 0b8000hmov DWORD [eax+8], 0b801fh;启用分页移动 eax, cr0或 eax, 80000000hmov cr0, eax;更改权限(转到 CPL=3)推送 DWORD 23h ;SS3推送 DWORD 07a00h ;ESP3推送 DWORD 1bh ;CS3推送 DWORD __32user__ ;EIP3回复__32用户__:;;这里我们处于 CPL=3;;设置 DS 为基数为 0 的段,ES 为基数为 1 的段mov ax, 23hmov ds, axmov ax, 2bh移动,斧头;连续六行写六个As(从第4行开始)异或 ecx, ecxmov ecx, 6mov ebx, 9000h + 80*2*3 ;指向VGA文本帧缓冲区的第4行.init_markers:mov WORD [ebx], 0941h添加 bx, 80*2十二月jnz .init_markers;ebx 指向第一个 A子 ebx, 80*2 * 6;Base 0 + Offset 0 = 0, 不应出错(标记保持 A)mov eax, DWORD [ds:7000h];Base 0 + Offset 1 = 1, 应该出错(marker变成B)添加 bx, 80*2mov eax, DWORD [ds:7001h];Base 1 + Offset 0 = 1, 应该出错(marker变成B)添加 bx, 80*2mov eax, DWORD [es:7000h];Base 1 + Offset 3 = 4, 不应该出错(标记保持 A)添加 bx, 80*2mov eax, DWORD [es:7003h];Base 1 + Offset 3 = 4 但页面未映射,应该#PF(标记变为 C)添加 bx, 80*2mov eax, DWORD [es:8003h];Base 1 + Offset 0 = 1 但页面未映射,如果#PF 标记变为 C,如果 #AC 标记变为 B添加 bx, 80*2mov eax, DWORD [es:8000h];Loop foever (不能在 CPL=3 时使用 HLT)jmp $;#PF 处理程序;将ebx指向的字节增加2PF_处理程序:add esp, 04h ;去掉错误码add DWORD [esp], 6 ;跳过当前指令add BYTE [ebx], 2 ;增量愤怒;#AC 处理程序;与#PF 处理程序相同,但加一AC_handler:添加 esp, 04h添加双字 [esp], 6inc BYTE [ebx]愤怒;GDT(条目0用作GDTR的内容)GDT dw GDT.end-GDT - 1GDT体重 0dd 0000ffffh, 00cf9a00h ;08 代码, 32, DPL 0dd 0000ffffh, 00cf9200h ;10 数据, 32, DPL 0dd 0000ffffh, 00cffa00h ;18 代码, 32, DPL 3dd 0000ffffh, 00cff200h ;20 数据, 32, DPL 3dd 0001ffffh, 00cff200h ;28 数据, 32, DPL 3, Base = 1dd 7000ffffh, 00cf8900h ;30 数据, 32, 0 (TSS).结尾:;IDT,为了节省空间,条目是动态设置的IDT dw 18*8-1dd IDT+8体重 0;签名时间 510-($-$$) db 0dw 0aa55h检查线性地址有意义吗?我认为这不是特别重要.如上所述,线性地址和物理地址共享相同的对齐方式,最高可达 4KiB.所以,现在,这根本不重要.目前,超过 64 字节的访问仍然需要分块执行,而这个限制在 x86 CPU 的微架构中设置得很深.I am studying the issue of alignment check. But I don't know whether the processor is checking on effective addresses, linear addresses or physical addresses, or all checks.For example, the effective address of a data has been aligned, but the linear address formed by adding the base address of the segment descriptor is no longer aligned, and the processor throws an #AC exception at this time. 解决方案 TL;DRI think it's the linear address.Keep reading for the test methodology and the test code.It's not the effective address (aka the offset)To test this it suffices to use a segment with a base that is not aligned.In my test, I've used a 32-bit data segment with a base of 1.The test is a "simple" legacy (i.e. non-UEFI) bootloader that will create said descriptor and test accessing the offsets 0x7000 and 0x7003 with DWORD width.The former will generate an #AC, the latter won't.This demonstrates that it's not the offset alone that is checked, because 0x7000 is an aligned offset that still faults with a base of 1.This is expected.I have a tradition of using a minimal output for the tests, so an explanation is mandatory.First, six blue As are written in six consecutive rows in the VGA buffer.Then before executing a load, a pointer is set to each of these As.The #AC handler will increment the pointed-to byte.So, if a row contains a B, the access generated an #AC.The first four rows are used for:Access using a segment with base 0 and offset 0x7000h. As expected, no #ACAccess using a segment with base 0 and offset 0x7003h. As expected, #ACAccess using a segment with base 1 and offset 0x7000h. This does generate an #AC thereby demonstrating that it's either the linear of the physical address that's checked.Access using a segment with base 1 and offset 0x7003h. This doesn't generate an #AC, confirming point 3.The next two rows are used to check the linear address vs the physical address.It's not the physical address: #AC instead of #PFThe #AC test only alignments up to 16 bytes but a linear and a physical address share the same alignment up to 4KiB at least.We would need a memory access that requires a data structure aligned on, at least, 8KiB to test if it's the physical or the linear address that's used for the check.Unfortunately, there is no such access (yet).I thought I could still gather some insight by checking what exception is generated when a misaligned load target an unmapped page.If a #PF is generated, the CPU will first translate the linear address and will then check. On the other way around, if an #AC is generated, the CPU will check before translating (remember that the page is not mapped).I modified the test to enable page, map the minimum amount of pages and handle a #PF by incrementing the byte under the pointer by two.When a load is executed, the corresponding A will either become a B if an #AC is generated or a C if a #PF is generated.Note that both are faults (eip on the stack points to the offending instruction) but both handlers resume from the next instruction (so each load is executed only once).These are the meaning of the last two rows:Access to an unmapped page using a segment with base 1 and offset 0x7003h. This generates a #PF as expected (the access is aligned so the only exception possible here is a #PF).Access to an unmapped page using a segment with base 1 and offset 0x7000h. This generates an #AC, therefore the CPU checks the alignment before attempting to translate the address.Point 6 seems to suggest that the CPU will perform the check on the linear address since no access to the page table is done.In point 6 both exceptions could be generated, the fact that #PF is not generated means that the CPU hasn't attempted translating the address when the alignment check is performed. (Or that #AC logically takes precedence. But likely the hardware wouldn't do a page walk before taking the #AC exception, even if it did probe the TLB after doing the base+offset calculation.)Test codeThe code is messy and more cumbersome than one may expect.The main hindrance is #AC only working at CPL=3.So we need to create the CPL=3 descriptor, plus a TSS segment and a TSS descriptor.To handle the exception we need an IDT and we also need paging.BITS 16ORG 7c00h ;Skip the BPB (My BIOS actively overwrite it) jmp SHORT __SKIP_BPB__ ;I eyeballed the BPB size (at least the part that may be overwritten) TIMES 40h db 0__SKIP_BPB__: ;Set up the segments (including CS) xor ax, ax mov ds, ax mov ss, ax xor sp, sp jmp 0:__START____START__: ;Clear and set the video mode (before we switch to PM) mov ax, 03h int 10h ;Disable the interrupts and load the GDT and IDT cli lgdt [GDT] lidt [IDT] ;Enable PM mov eax, cr0 or al, 1 mov cr0, eax ;Write a TSS segment, we zeros 104h DWORDs and only set the SS0:ESP0 fields mov di, 7000h mov cx, 104h xor ax, ax rep stosd mov DWORD [7004h], 7c00h ;ESP0 mov WORD [7008h], 10h ;SS0 ;Set AC in EFLAGS pushfd or DWORD [esp], 1 << 18 popfd ;Set AM in CR0 mov eax, cr0 or eax, 1<<18 mov cr0, eax ;OK, let's go in PM for real jmp 08h:__32____32__: BITS 32 ;Set the stack and DS mov ax, 10h mov ss, ax mov esp, 7c00h mov ds, ax ;Set the #AC handler mov DWORD [IDT+8+17*8], ((AC_handler-$$+7c00h) & 0ffffh) | 00080000h mov DWORD [IDT+8+17*8+4], 8e00h | (((AC_handler-$$+7c00h) >> 16) << 16) ;Set the #PF handler mov DWORD [IDT+8+14*8], ((PF_handler-$$+7c00h) & 0ffffh) | 00080000h mov DWORD [IDT+8+14*8+4], 8e00h | (((PF_handler-$$+7c00h) >> 16) << 16) ;Set the TSS mov ax, 30h ltr ax ;Paging is: ;7xxx -> Identity mapped (contains code and all the stacks and system structures) ;8xxx -> Not present ;9xxx -> Mapped to the VGA text buffer (0b8xxxh) ;Note that the paging structures are at 6000h and 5000h, this is OK as these are physical addresses ;Set the Page Directory at 6000h mov eax, 6000h mov cr3, eax ;Set the Page Directory Entry 0 (for 00000000h-00300000h) to point to a Page Table at 5000h mov DWORD [eax], 5007h ;Set the Page Table Entry 7 (for 00007xxxh) to identity map and Page Table Entry 8 (for 000008xxxh) to be not present mov eax, 5000h + 7*4 mov DWORD [eax], 7007h mov DWORD [eax+4], 8006h ;Map page 9000h to 0b8000h mov DWORD [eax+8], 0b801fh ;Enable paging mov eax, cr0 or eax, 80000000h mov cr0, eax ;Change privilege (goto CPL=3) push DWORD 23h ;SS3 push DWORD 07a00h ;ESP3 push DWORD 1bh ;CS3 push DWORD __32user__ ;EIP3 retf__32user__: ; ;Here we are at CPL=3 ; ;Set DS to segment with base 0 and ES to one with base 1 mov ax, 23h mov ds, ax mov ax, 2bh mov es, ax ;Write six As in six consecutive row (starting from the 4th) xor ecx, ecx mov ecx, 6 mov ebx, 9000h + 80*2*3 ;Points to 4th row in the VGA text framebuffer.init_markers: mov WORD [ebx], 0941h add bx, 80*2 dec ecx jnz .init_markers ;ebx points to the first A sub ebx, 80*2 * 6 ;Base 0 + Offset 0 = 0, Should not fault (marker stays A) mov eax, DWORD [ds:7000h] ;Base 0 + Offset 1 = 1, Should fault (marker becomes B) add bx, 80*2 mov eax, DWORD [ds:7001h] ;Base 1 + Offset 0 = 1, Should fault (marker becomes B) add bx, 80*2 mov eax, DWORD [es:7000h] ;Base 1 + Offset 3 = 4, Should not fault (marker stays A) add bx, 80*2 mov eax, DWORD [es:7003h] ;Base 1 + Offset 3 = 4 but page not mapped, Should #PF (markers becomes C) add bx, 80*2 mov eax, DWORD [es:8003h] ;Base 1 + Offset 0 = 1 but page not mapped, if #PF the markers becomes C, if #AC the markers becomes B add bx, 80*2 mov eax, DWORD [es:8000h] ;Loop foever (cannot use HLT at CPL=3) jmp $;#PF handler;Increment the byte pointed by ebx by twoPF_handler: add esp, 04h ;Remove the error code add DWORD [esp], 6 ;Skip the current instruction add BYTE [ebx], 2 ;Increment iret;#AC handler;Same as the #PF handler but increment by oneAC_handler: add esp, 04h add DWORD [esp], 6 inc BYTE [ebx] iret ;The GDT (entry 0 is used as the content for GDTR) GDT dw GDT.end-GDT - 1 dd GDT dw 0 dd 0000ffffh, 00cf9a00h ;08 Code, 32, DPL 0 dd 0000ffffh, 00cf9200h ;10 Data, 32, DPL 0 dd 0000ffffh, 00cffa00h ;18 Code, 32, DPL 3 dd 0000ffffh, 00cff200h ;20 Data, 32, DPL 3 dd 0001ffffh, 00cff200h ;28 Data, 32, DPL 3, Base = 1 dd 7000ffffh, 00cf8900h ;30 Data, 32, 0 (TSS) .end: ;The IDT, to save space the entries are set dynamically IDT dw 18*8-1 dd IDT+8 dw 0 ;Signature TIMES 510-($-$$) db 0 dw 0aa55hDoes it make sense to check the linear address?I don't think it's particularly relevant.As noted above, a linear and a physical address share the same alignment up to 4KiB.So, for now, it doesn't matter at all.Right now, accesses wider than 64 bytes would still need to be performed in chunks and this limit is set deep in the microarchitectures of the x86 CPUs. 这篇关于内存对齐检查机制检查的地址是有效地址、线性地址还是物理地址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-12 07:30