最近两天,发现了一篇很牛的博文,这个博文彻底解决了逻辑地址 线性地址 物理地址的内存映射问题,作者的功力特别身后,他十分kind的提供了一篇29页的pdf文档,此文章一出,就彻底终结这个问题了。那我为什么还要写这篇博文呢。作者以2.6.18内核为例,提供了两个内核模块和两个应用层的程序,我在自己的Ubuntu 12.04上花了时间完整的验证了文档里面PAE(Physical Address Extension)模式的地址映射,发现代码里面存在一些兼容性的问题,导致编译不过,主要是内核版本不同和gcc带来的一些小问题。所以我花了4个多小时才把这个实验完整的做下来。如果想通过做实验来加深理解的筒子可以参考我修改后的程序。我无意抄袭,还是那句话,光荣属于前辈。
下面的图来自Intel的手册64-ia-32-architectures-software-developer-vol-3a-part-1-manual ,很好的解释的逻辑地址到物理地址的映射。所谓逻辑地址,就是我们C 语言中取地址符后,看到的地址。
采用原文的函数
- #include <stdio.h>
- int main()
- {
- unsigned long tmp;
- tmp = 0x12345678;
- printf("tmp address:0x%08lX\n", &tmp);
- return 0;
- }
- tmp address:0xBF86D16C
1 段式映射
临时变量tmp的逻辑地址0xBF86D16C就是偏移量,因为tmp位于栈中,IA-32提供了SS(Stack Segment)寄存器。
- //arch/x86/kernel/process_32.c
- //-------------------------------------------
- void
- start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
- {
- set_user_gs(regs, 0);
- regs->fs = 0;
- regs->ds = __USER_DS;
- regs->es = __USER_DS;
- regs->ss = __USER_DS;
- regs->cs = __USER_CS;
- regs->ip = new_ip;
- regs->sp = new_sp;
- /*
- * Free the old FP and other extended state
- */
- free_thread_xstate(current);
- }
- arch/x86/include/asm/segment.h
- ------------------------------------------
- #define GDT_ENTRY_DEFAULT_USER_CS 14
- #define GDT_ENTRY_DEFAULT_USER_DS 15
- #define GDT_ENTRY_KERNEL_BASE (12)
- #define GDT_ENTRY_KERNEL_CS (GDT_ENTRY_KERNEL_BASE+0)
- #define GDT_ENTRY_KERNEL_DS (GDT_ENTRY_KERNEL_BASE+1)
- #define __KERNEL_CS (GDT_ENTRY_KERNEL_CS*8)
- #define __KERNEL_DS (GDT_ENTRY_KERNEL_DS*8)
- #define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS*8+3)
- #define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8+3)
- 0000000001110 011
- 0000000001111 011
TI表示我要选择的段描述符是存在GDT中还是LDT中。GDT和LDT可以简单理解成两个表,每个表里面都存放这一组地址。
我们的CS和DS对应的TI位都是0,换句话说,我们要着的段描述符在GDT中。实际上,我们的Linux程序里用的段描述符总是选择GDT,几乎没有选择LDT的。毛德操老爷子说,只有像wine这种进程才会用到LDT这样的东西。
RPL表示特权等级,0表示最高权限,3表示无特权。之所以在
- #define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8+3)
接下来就是去GDT这张表,去找到我们要的段描述符。等等,我们一直很爽的叫着GDT,知道我们的DS段描述符是在index =15的位置,可是从来没有人告诉我们GDT这张表放在哪里。
GDTR横空出世了,GDT的地址就存放在GDTR这个寄存器里面。问题是怎么读出啦GDTR寄存器的值?
前面提到的博文作者写了一个内核模块,来提取GDTR,CR0 CR3 等的值,主干代码在下面:
- static int my_get_info( char *buf, char **start, off_t off, int count )
- {
- int len = 0;
- struct mm_struct *mm;
- mm = current->active_mm;
- cr0 = read_cr0();
- cr3 = read_cr3();
- cr4 = read_cr4();
- //asm(" sgdt gdtr");
- asm("sgdt %0":"=m"(gdtr));
- len += sprintf( buf+len, "cr4=%08X ", cr4 );
- len += sprintf( buf+len, "PSE=%X ", (cr4>>4)&1 );
- len += sprintf( buf+len, "PAE=%X ", (cr4>>5)&1 );
- len += sprintf( buf+len, "\n" );
- len += sprintf( buf+len, "cr3=%08X cr0=%08X\n",cr3,cr0);
- len += sprintf( buf+len, "pgd:0x%08X\n",(unsigned int)mm->pgd);
- len += sprintf( buf+len, "gdtr address:%lX, limit:%X\n", gdtr.address,gdtr.limit);
- // len += sprintf( buf+len, "cpu_gdt_table address:0x%08lX\n", cpu_gdt_table);
- return len;
- }
总之我们有办法取GDTR寄存器的值,从而找到了GDT这张表,然后从这张表里面着第16项(index=15),我们就能找到我们的DS段描述符。
- root@manu:~/code/c/self/mm_addr# ./mem_map
- %ebp:0xBF86D178
- tmp address:0xBF86D16C
- cr4=000006F0 PSE=1 PAE=1
- cr3=06E3C000 cr0=8005003B
- pgd:0xC6E3C000
- gdtr address:F7BB9000, limit:FF
可以算出GDT的地址为F7BB9000 - c0000000,然后用作者提供的工具fileview去看下内存内容
- -----------------------------------------------------------
- gdtr : f7bb9000 - c0000000 = 37bb9000
- 0000037BB9000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9030 FF FF 00 B9 61 F3 DF B7 00 00 00 00 00 00 00 00 ....a...........
- 0000037BB9040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9060 FF FF 00 00 00 9B CF 00 FF FF 00 00 00 93 CF 00 ................
- 0000037BB9070 FF FF 00 00 00 FB CF 00 FF FF 00 00 00 F3 CF 00 ................
- 0000037BB9080 6B 20 C0 EA BB 8B 00 F7 00 00 00 00 00 00 00 00 k ..............
- 0000037BB9090 FF FF 00 00 00 9A 40 00 FF FF 00 00 00 9A 00 00 ......@.........
- 0000037BB90A0 FF FF 00 00 00 92 00 00 00 00 00 00 00 92 00 00 ................
- 0000037BB90B0 00 00 00 00 00 92 00 00 FF FF 00 00 00 9A 40 00 ..............@.
- 0000037BB90C0 FF FF 00 00 00 9A 00 00 FF FF 00 00 00 92 40 00 ..............@.
- 0000037BB90D0 FF FF 00 00 00 92 CF 00 FF FF 00 40 29 93 8F 36 ...........@)..6
- 0000037BB90E0 18 00 80 0C BC 91 40 F7 00 00 00 00 00 00 00 00 ......@.........
- 0000037BB90F0 00 00 00 00 00 00 00 00 6B 20 00 48 80 89 00 C1 ........k .H....
- FF FF 00 00 00 F3 CF 00 = 00cff300 0000ffff
自己对照就能的出,BASE=0x00000000,费了半点的劲,最后的得出:
分段机制是fake的,虚拟地址总是能线性地址。
我们还可以得到其他有用的信息:
- S=1 非系统段
- G=1 以4096为单位
- DPL=0x11,内核态用户态均可访问
2 页式映射。
有了线性地址,下一步就是获取物理地址了。
我的电脑采用了PAE,物理地址扩展分页机制,看下我的uname -ar
- uname -r
- 3.2.0-29-generic-pae
先讲讲啥是PAE。 目前的服务器基本都突破了4G的内存,很多PC都已经突破4G 了,我有同事就有16G 内存的PC,让我羡慕的直流口水。Intel通过把管脚从32增加到36,可以支持64G内存,但是,必须引入一种新的分页机制,把32位的线性地址转化成36位的物理地址,才能充分利用这64G的内存。
这个机制就是PAE :
1 引入一个页目录指针表PDPT,有4个64位的item组成。
2 cr3寄存器中27位用来表示 页目录指针表PDPT的地址(32字节对齐,所以不需要32来表示)。
3 线性地址的高2位决定4个PDPT item的的哪一个。
上图完整的描述了PAE模式下线性地址到物理地址的映射。稍微不好懂的就是40这个数字的含义:
Intel手册里面有下面的句子:
1)A PDE is selected using the physical address defined as follows:
— Bits 51:12 are from PDPTEi.
— Bits 11:3 are bits 29:21 of the linear address.
— Bits 2:0 are 0.
2)PDE的bit7(PS位)决定了采用4K大小的页还是2M 大小的页。如果是2M 大小的页,上面的图针对的是4K 大小的页。2M大小的页采用这种模式:
对于我们而言,我们采用的不是2M 大小的页,后面实验中我们可以看下PS位。所以这种2M的页的模式,后面我们就不讲了。
3)A PTE is selected using the physical address defined as follows:
— Bits 51:12 are from the PDE.
— Bits 11:3 are bits 20:12 of the linear address.
— Bits 2:0 are 0.
4)获取最后的物理地址
— Bits 51:12 are from the PTE.
— Bits 11:0 are from the original linear address.
OK 回到我们的例子:
- 线性地址:
- 0xBF86D16C
- 0x 10 111111 100 0 0110 1101 0001 0110 1100
- root@manu:~/code/c/self/mm_addr# ./mem_map
- %ebp:0xBF86D178
- tmp address:0xBF86D16C
- cr4=000006F0 PSE=1 PAE=1
- cr3=06E3C000 cr0=8005003B
- pgd:0xC6E3C000
- gdtr address:F7BB9000, limit:FF
我们看下0x6E3C000地址下存放的啥东西,再次祭出我们的dram神器:
- 0000006E3C000 01 B0 E3 06 00 00 00 00 01 60 3C 08 00 00 00 00 .........`<.....
- 0000006E3C010 01 50 3C 08 00 00 00 00 01 40 93 01 00 00 00 00 .P<......@......
- 0000006E3C020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 01 50 3C 08 00 00 00 00 = 0x083c5001,
其中bit 0表示的是present,表示该64位地址是有效的。
其中bit7(PS位)没有置位,表明采用的页是4K 大小的页,而不是2M大小的页。
可以算出表项的基地址为:0x83c5000。
- 线性地址:
- 0xBF86D16C
- 0x 10 111111 100 0 0110 1101 0001 0110 1100
看下这个地址下的内容:
- 00000083C5FB0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C5FC0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C5FD0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C5FE0 67 70 D8 06 00 00 00 00 00 00 00 00 00 00 00 00 gp..............
- 00000083C5FF0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C60A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000
- 67 70 D8 06 00 00 00 00 = 0x6d87067
- 线性地址:
- 0xBF86D16C
- 0x 10 111111 100 0 0110 1101 0001 0110 1100
看下这个地址下的内容
- 0000006D87360 47 40 65 07 00 00 00 80 47 A0 94 0D 00 00 00 80 G@e.....G.......
- 0000006D87370 47 B0 BD 09 00 00 00 80 00 00 00 00 00 00 00 00 G...............
- 0000006D87380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87390 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87430 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87440 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87450 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 47 A0 94 0D 00 00 00 80 = 0x80000000 0d94a047
1) 12~51位来自 0x80000000 0d94a047
换句话说就是:0d94a000
2) 0 ~11来自线性地址的最后12位
- 线性地址:
- 0xBF86D16C
- 0x 10 111111 100 0 0110 1101 0001 0110 1100
- 0xd94a000 + (0001 0110 1100)b = 0x0d94a16c
用我们的神器看下物理地址的内容是不是0x12345678
- 000000D94A160 70 A2 7A B7 00 00 00 00 A9 86 04 08 78 56 34 12 p.z.........xV4.
- 000000D94A170 A0 86 04 08 00 00 00 00 00 00 00 00 D3 14 5F B7 .............._.
- 000000D94A180 01 00 00 00 14 D2 86 BF 1C D2 86 BF 58 98 79 B7 ............X.y.
- 000000D94A190 00 00 00 00 1C D2 86 BF 1C D2 86 BF 00 00 00 00 ................
- 000000D94A1A0 A0 82 04 08 F4 DF 77 B7 00 00 00 00 00 00 00 00 ......w.........
- 000000D94A1B0 00 00 00 00 A9 68 DD 32 B8 4C 57 81 00 00 00 00 .....h.2.LW.....
- 000000D94A1C0 00 00 00 00 00 00 00 00 01 00 00 00 A0 84 04 08 ................
- 000000D94A1D0 00 00 00 00 A0 F6 7A B7 E9 13 5F B7 F4 BF 7B B7 ......z..._...{.
- 000000D94A1E0 01 00 00 00 A0 84 04 08 00 00 00 00 C1 84 04 08 ................
- 000000D94A1F0 54 85 04 08 01 00 00 00 14 D2 86 BF A0 86 04 08 T...............
- 000000D94A200 10 87 04 08 70 A2 7A B7 0C D2 86 BF 18 C9 7B B7 ....p.z.......{.
- 000000D94A210 01 00 00 00 DF E8 86 BF 00 00 00 00 E9 E8 86 BF ................
- 000000D94A220 F9 E8 86 BF 04 E9 86 BF 0E E9 86 BF 2F EE 86 BF ............/...
- 000000D94A230 3E EE 86 BF 4C EE 86 BF 5A EE 86 BF 6E EE 86 BF >...L...Z...n...
- 000000D94A240 B0 EE 86 BF D3 EE 86 BF E4 EE 86 BF EC EE 86 BF ................
- 000000D94A250 03 EF 86 BF 13 EF 86 BF 25 EF 86 BF 32 EF 86 BF ........%...2...
再次感谢ilinuxkernel博主写的文档,让我解决了这个彻底解决了这个虚拟地址到物理地址的转换,我喜欢这样的文章,他让我更深刻的理解计算机的原理,这片博文绝大部分的贡献都是这位kind的博主,光荣属于前辈。
为了方便感兴趣的筒子顺利的做这个实验,我将这个修改后的代码放在github上。没有窃取原博主劳动成果的意思。
地址为:https://github.com/manuscola/mm_addr
plus:
fileview工具提供了按照字节,双字节 ,4字节,8字节的方式来展示内存内容,可惜我昨晚实验的时候,没好好看fileview的源代码,所以都是按照BYTE的方式展现物理内存的内容。后面有感兴趣的筒子想做实验的话,可以好好看下fileview的source code。
参考文献:
1 Linux内存地址映射
2 深入理解计算机系统
3 深入理解linux内核
4 Linux用户程序如何访问物理内存
5 http://cs.usfca.edu/~cruse/cs635/