本文介绍了在绘图内存VGA一个字符用GNU C内联汇编的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

I'm学习做DOS一些低级别VGA编程C和内联汇编。现在，我正尝试创建输出的字符在屏幕上的功能。

I´m learning to do some low level VGA programming in DOS with C and inline assembly. Right now I´m trying to create a function that prints out a character on screen.

这是我的code：

//This is the characters BITMAPS
uint8_t characters[464] = {
  0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x20,0x20,0x20,0x20,0x00,0x20,0x00,0x50,
  0x50,0x00,0x00,0x00,0x00,0x00,0x50,0xf8,0x50,0x50,0xf8,0x50,0x00,0x20,0xf8,0xa0,
  0xf8,0x28,0xf8,0x00,0xc8,0xd0,0x20,0x20,0x58,0x98,0x00,0x40,0xa0,0x40,0xa8,0x90,
  0x68,0x00,0x20,0x40,0x00,0x00,0x00,0x00,0x00,0x20,0x40,0x40,0x40,0x40,0x20,0x00,
  0x20,0x10,0x10,0x10,0x10,0x20,0x00,0x50,0x20,0xf8,0x20,0x50,0x00,0x00,0x20,0x20,
  0xf8,0x20,0x20,0x00,0x00,0x00,0x00,0x00,0x60,0x20,0x40,0x00,0x00,0x00,0xf8,0x00,
  0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x60,0x60,0x00,0x00,0x08,0x10,0x20,0x40,0x80,
  0x00,0x70,0x88,0x98,0xa8,0xc8,0x70,0x00,0x20,0x60,0x20,0x20,0x20,0x70,0x00,0x70,
  0x88,0x08,0x70,0x80,0xf8,0x00,0xf8,0x10,0x30,0x08,0x88,0x70,0x00,0x20,0x40,0x90,
  0x90,0xf8,0x10,0x00,0xf8,0x80,0xf0,0x08,0x88,0x70,0x00,0x70,0x80,0xf0,0x88,0x88,
  0x70,0x00,0xf8,0x08,0x10,0x20,0x20,0x20,0x00,0x70,0x88,0x70,0x88,0x88,0x70,0x00,
  0x70,0x88,0x88,0x78,0x08,0x70,0x00,0x30,0x30,0x00,0x00,0x30,0x30,0x00,0x30,0x30,
  0x00,0x30,0x10,0x20,0x00,0x00,0x10,0x20,0x40,0x20,0x10,0x00,0x00,0xf8,0x00,0xf8,
  0x00,0x00,0x00,0x00,0x20,0x10,0x08,0x10,0x20,0x00,0x70,0x88,0x10,0x20,0x00,0x20,
  0x00,0x70,0x90,0xa8,0xb8,0x80,0x70,0x00,0x70,0x88,0x88,0xf8,0x88,0x88,0x00,0xf0,
  0x88,0xf0,0x88,0x88,0xf0,0x00,0x70,0x88,0x80,0x80,0x88,0x70,0x00,0xe0,0x90,0x88,
  0x88,0x90,0xe0,0x00,0xf8,0x80,0xf0,0x80,0x80,0xf8,0x00,0xf8,0x80,0xf0,0x80,0x80,
  0x80,0x00,0x70,0x88,0x80,0x98,0x88,0x70,0x00,0x88,0x88,0xf8,0x88,0x88,0x88,0x00,
  0x70,0x20,0x20,0x20,0x20,0x70,0x00,0x10,0x10,0x10,0x10,0x90,0x60,0x00,0x90,0xa0,
  0xc0,0xa0,0x90,0x88,0x00,0x80,0x80,0x80,0x80,0x80,0xf8,0x00,0x88,0xd8,0xa8,0x88,
  0x88,0x88,0x00,0x88,0xc8,0xa8,0x98,0x88,0x88,0x00,0x70,0x88,0x88,0x88,0x88,0x70,
  0x00,0xf0,0x88,0x88,0xf0,0x80,0x80,0x00,0x70,0x88,0x88,0xa8,0x98,0x70,0x00,0xf0,
  0x88,0x88,0xf0,0x90,0x88,0x00,0x70,0x80,0x70,0x08,0x88,0x70,0x00,0xf8,0x20,0x20,
  0x20,0x20,0x20,0x00,0x88,0x88,0x88,0x88,0x88,0x70,0x00,0x88,0x88,0x88,0x88,0x50,
  0x20,0x00,0x88,0x88,0x88,0xa8,0xa8,0x50,0x00,0x88,0x50,0x20,0x20,0x50,0x88,0x00,
  0x88,0x50,0x20,0x20,0x20,0x20,0x00,0xf8,0x10,0x20,0x40,0x80,0xf8,0x00,0x60,0x40,
  0x40,0x40,0x40,0x60,0x00,0x00,0x80,0x40,0x20,0x10,0x08,0x00,0x30,0x10,0x10,0x10,
  0x10,0x30,0x00,0x20,0x50,0x88,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xf8,
  0x00,0xf8,0xf8,0xf8,0xf8,0xf8,0xf8};
/**************************************************************************
 *  put_char                                                              *
 *     Print char                                                         *
 **************************************************************************/
void put_char(int x ,int y,int ascii_char ,byte color){

    __asm__(
        "push %si\n\t"
        "push %di\n\t"
        "push %cx\n\t"
        "mov color,%dl\n\t"   //test color
        "mov ascii_char,%al\n\t"  //test char
        "sub $32,%al\n\t"
        "mov $7,%ah\n\t"
        "mul %ah\n\t"
        "lea $characters,%si\n\t"
        "add %ax,%si\n\t"
        "mov $7,%cl\n\t"
        "0:\n\t"
        "segCS %lodsb\n\t"   
        "mov $6,%ch\n\t"
        "1:\n\t"    
        "shl $1,%al\n\t"
        "jnc 2f\n\t"
        "mov %dl,%ES:(%di)\n\t"
        "2:\n\t"
        "inc %di\n\t"
        "dec %ch\n\t"
        "jnz 1b\n\t"
        "add $320-6,%di\n\t"
        "dec %cl\n\t"
        "jnz  0b\n\t"
        "pop %cx\n\t"
        "pop %di\n\t"
        "pop %si\n\t"
        "retn"

    );


}

I'm指导自己从这个系列写在PASCAL教程：。

我根据gcc编译器改变了汇编语法，但I'm仍然得到这个错误：

I changed the assembly syntax according to the gcc compiler, but I´m still getting this errors:

Operand mismatch type for 'lea'
No such instruction 'segcs lodsb'
No such instruction 'retn'

修改

我一直在努力提高我的code并且至少现在我看到屏幕上的内容。 Here's我更新code：

I have been working on improving my code and at least now I see something on the screen. Here´s my updated code:

/**************************************************************************
 *  put_char                                                              *
 *     Print char                                                         *
 **************************************************************************/
void put_char(int x,int y){
    int char_offset;
    int l,i,j,h,offset;
    j,h,l,i=0;
    offset = (y<<8) + (y<<6) + x;               
    __asm__(

        "movl _VGA, %%ebx;" // VGA memory pointer   
        "addl %%ebx,%%edi;"  //%di points to screen


        "mov _ascii_char,%%al;"
        "sub $32,%%al;"
        "mov $7,%%ah;"
        "mul %%ah;"

        "lea _characters,%%si;"
        "add %%ax,%%si;"   //SI point to bitmap

        "mov $7,%%cl;"

        "0:;"
            "lodsb %%cs:(%%si);"   //load next byte of bitmap 

            "mov $6,%%ch;"
        "1:;"   
            "shl $1,%%al;"
            "jnc 2f;"
            "movb %%dl,(%%edi);"  //plot the pixel
        "2:\n\t"
            "incl %%edi;"
            "dec %%ch;"
            "jnz 1b;"
            "addl $320-6,%%edi;"
            "dec %%cl;"
            "jnz  0b;"


        :  "=D" (offset)
        : "d" (current_color)

    );


}

如果你看到上面我试图写字母S的形象。结果是，你在屏幕的左上侧看到绿色的像素。不管是什么x和y我给functon它总是地块上同一位置的像素。

If you see the image above I was trying to write the letter "S". The results are the green pixels that you see on the upper left side of the screen. No matter what x and y I give the functon it always plots the pixels on that same spot.

谁能帮我改正我的code？

Can anyone help me correct my code?

推荐答案

请参阅下面的一些事情，是专门错了你的 PUT_CHAR 函数的分析，和版本可能的工作。（我不知道有关 CS％段重载，但除此之外，它应该做你打算什么）。

See below for an analysis of some things that are specifically wrong with your put_char function, and a version that might work. (I'm not sure about the %cs segment override, but other than that it should do what you intend).

首先，DOS和16位的x86是彻底过时，是的不的更容易学习比正常的64位x86的。即使是32位的x86是在Windows世界中广泛使用已经过时，但仍。

First of all, DOS and 16bit x86 are thoroughly obsolete, and are not easier to learn than normal 64bit x86. Even 32bit x86 is obsolete, but still in wide use in the Windows world.

32位和64位code不必在乎很多像段或有限的寄存器选择在寻址模式16位的限制/并发症。一些现代的系统确实使用段覆盖了线程本地存储，但学习如何使用细分16位code几乎没有连接到这一点。

32bit and 64bit code don't have to care about a lot of 16bit limitations / complications like segments or limited register choice in addressing modes. Some modern systems do use segment overrides for thread-local storage, but learning how to use segments in 16bit code is barely connected to that.

一来了解ASM的主要好处是调试/分析/优化实时程序。如果您想了解如何编写C或其他高级code，可以编译成汇编效率，你可能会看着编译器输出。这将是64位（或32位）。同样的，如果你在寻找性能计数器结果注释你的二进制文件的拆卸。激进的编译器优化指望着每源代码行数比每个指令信息要少得多。

One of the major benefits to knowing asm is for debugging/profiling/optimizing real programs. If you want to understand how to write C or other high-level code that can compile to efficient asm, you'll probably be looking at compiler output. This will be 64bit (or 32bit). Same if you're looking at performance-counter results annotating a disassembly of your binary. Aggressive compiler optimizations mean that looking at counts per source line are much less informative than per instruction.

另外，你的程序实际做的任何东西，它要么跟硬件直接，或使系统调用。学习DOS系统调用的文件访问和用户输入的是完全是浪费时间。他们是从真正的操作系统API的完全不同。开发新的DOS应用程序是没有用的，所以你必须学习另一种API，当你到做一些与你的ASM知识的阶段。

Also, for your program to actually do anything, it has to either talk to hardware directly, or make system calls. Learning DOS system calls for file access and user input is a complete waste of time. They're quite different from the APIs in real OSes. Developing new DOS applications is not useful, so you'd have to learn another API when you get to the stage of doing something with your asm knowledge.

在8086模拟器学习ASM更是限制：186，286，和386加入像许多方便说明 IMUL ECX，15 ，使得 AX 少特殊。限制自己只在8086工作的指示意味着你将弄清楚坏的方式来做事。其他大的是 MOVZX / MOVSX ，按立即数（除1以外），并移由可变转向计数。除了性能，也更容易编写code，使用时可用的，因为你没有写一个循环超过1位转移。

Learning asm on an 8086 simulator is even more limiting: 186, 286, and 386 added many convenient instructions like imul ecx, 15, making ax less "special". Limiting yourself to only instructions that work on 8086 means you'll figure out "bad" ways to do things. Other big ones are movzx / movsx, shift by an immediate count (other than 1), and shift by a variable count. Besides performance, it's also easier to write code when these are available, because you don't have to write a loop to shift by more than 1 bit.

我主要是从阅读编译器的输出，然后进行小的变化教训汇编。我没有尝试在汇编写的东西时，我真的不明白的事情，但如果你要快速了解（而不仅仅是发展的理解，同时调试/剖析C），你可能需要通过测试你的理解编写自己的code。你需要了解的基础知识，有8个或16个整数寄存器+上的标志和指令指针，并且每个指令使一个明确定义的修改机器的当前结构状态。（请参见英特尔的insn每个指令的完整说明参考手册（链接在的维基）。

I mostly learned asm from reading compiler output, then making small changes. I didn't try to write stuff in asm when I didn't really understand things, but if you're going to learn quickly (rather than just evolve an understanding while debugging / profiling C), you probably need to test your understanding by writing your own code. You do need to understand the basics, that there are 8 or 16 integer registers + the flags and instruction pointer, and that every instruction makes a well-defined modification to the current architectural state of the machine. (See the Intel insn ref manual for complete descriptions of every instruction (links in the x86 wiki).

您可能要开始像在写汇编单一的功能，作为一个更大计划的一部分简单的事情。需要理解的那种ASM，使系统调用是有用的，但在现实的方案是通常只用手工编写汇编对于不涉及任何系统调用的内部循环。这是耗时写汇编读取输入并打印结果，所以我建议在做C.那部分请务必阅读编译器的输出，并了解这是怎么回事，和一个整数和一个字符串，区别在哪与strtol 和的printf 做的，即使你不自己写他们。

You might want to start with simple things like writing a single function in asm, as part of a bigger program. Understanding the kind of asm needed to make system calls is useful, but in real programs it's normally only useful to hand-write asm for inner loops that don't involve any system calls. It's time-consuming to write asm to read input and print results, so I'd suggest doing that part in C. Make sure you read the compiler output and understand what's going on, and the difference between an integer and a string, and what strtol and printf do, even if you don't write them yourself.

一旦你认为你了解足够的基础知识，发现在一些程序你熟悉和/或感兴趣的功能，看看你能打败编译器和保存指令（或使用速度更快的指令）。或者你自己的实施它没有的使用编译器输出为出发点，无论你觉得更有趣。 This回答可能是有趣的，虽然重心有发现了编译器产生最佳ASM C源代码。

Once you think you understand enough of the basics, find a function in some program you're familiar with and/or interested in, and see if you can beat the compiler and save instructions (or use faster instructions). Or implement it yourself without using the compiler output as a starting point, whichever you find more interesting. This answer might be interesting, although the focus there was finding C source that got the compiler to produce the optimal ASM.

有从人找来找去SO问题我怎么做X在ASM，答案通常是同在C会。别那么陷入了ASM是不熟悉的，你忘了如何编程。弄清楚什么需要发生功能操作的数据，然后弄清楚如何做到这一点的ASM。如果您遇到问题，并要问一个问题，你应该有一大部分工作实现的，只有一个部分，你不知道用什么指令一步。

There are many SO questions from people asking "how do I do X in asm", and the answer is usually "the same as you would in C". Don't get so caught up in asm being unfamiliar that you forget how to program. Figure out what needs to happen to the data the function operates on, then figure out how to do that in asm. If you get stuck and have to ask a question, you should have most of a working implementation, with just one part that you don't know what instructions to use for one step.

您应该使用32位或64位的x86做到这一点。我建议64位，因为ABI是更好，但32位功能将迫使你多利用堆栈。这样可以帮助你理解呼叫指令如何把堆栈上的返回地址，并在调用者实际推ARG游戏之后。（这似乎是你试图避免通过使用内联汇编处理的）。

You should do this with 32 or 64bit x86. I'd suggest 64bit, since the ABI is nicer, but 32bit functions will force you to make more use of the stack. So that might help you understand how a call instruction puts the return address on the stack, and where the args the caller pushed actually are after that. (This appears to be what you tried to avoid dealing with by using inline asm).

学习如何通过直接修改视频RAM做的显卡是没有用的，除了满足有关用于计算机如何工作的好奇。你不能利用这些知识对任何事情。现代图形API存在，让多个程序绘制自己的屏幕区域，并允许间接（上纹理，而不是直接在屏幕例如画画，所以3D窗口翻转使用Alt-Tab可以看看花哨）。有太多的理由可以列出这里直接在视频RAM不拉丝。

Learning how to do graphics by directly modifying video RAM is not useful, other than to satisfy curiosity about how computers used to work. You can't use that knowledge for anything. Modern graphics APIs exist to let multiple programs draw in their own regions of the screen, and to allow indirection (e.g. draw on a texture instead of the screen directly, so 3D window-flipping alt-tab can look fancy). There too many reasons to list here for not drawing directly on video RAM.

借鉴了像素图缓冲区，然后使用图形API将其复制到屏幕上是可能的。不过，在所有做的位图图形或多或少过时。现代图形API抽象掉的分辨率，让你的应用程序可以在一个合理的规模借鉴的东西，无论每个像素有多大。（小，但非常高解析度的屏幕与低苏亚雷斯大电视）。

Drawing on a pixmap buffer and then using a graphics API to copy it to the screen is possible. Still, doing bitmap graphics at all is more or less obsolete. Modern graphics APIs abstract away the resolution, so your app can draw things at a reasonable size regardless of how big each pixel is. (small but extremely high rez screen vs. big TV at low rez).

这是一种很酷写入内存，并在屏幕上看到的东西改变。甚至更好，挂钩的LED（小电阻器），以数据位的并行端口上，并运行 OUTB 指令将其打开/关闭。我很久以前这样做我的Linux系统上。我提出，用 IOPL（2）和内联汇编一个小包装程序，并运行它的根。你也许可以做到在Windows相似。你不需要DOS或16位code让你的脚湿谈话的硬件。

It is kind of cool to write to memory and see something change on-screen. Or even better, hook up LEDs (with small resistors) to the data bits on a parallel port, and run an outb instruction to turn them on/off. I did this on my Linux system ages ago. I made a little wrapper program that used iopl(2) and inline asm, and ran it as root. You can probably do similar on Windows. You don't need DOS or 16bit code to get your feet wet talking to the hardware.

在 / 退出的说明，和正常加载/存储到内存映射IO和DMA，是怎么真正驱动谈硬件，包括东西远比并行端口更复杂。它的乐趣，知道你的硬件真是如何工作的，但只花时间就可以了，如果你真正感兴趣，或者想编写驱动程序。 Linux的源代码树包括对硬件的容载量的驱动程序，并经常很好的注释，所以，如果你喜欢读书code不亚于写作code，这是另一种方式来获得什么阅读的感觉，当他们谈论司机做硬件。

in/out instructions, and normal loads/stores to memory-mapped IO, and DMA, are how real drivers talk to hardware, including things far more complicated than parallel ports. It's fun to know how your hardware "really" works, but only spend time on it if you're actually interested, or want to write drivers. The Linux source tree includes drivers for boatloads of hardware, and is often well commented, so if you like reading code as much as writing code, that's another way to get a feel for what read drivers do when they talk to hardware.

这是总体上是好的有一些想法的东西引擎盖下是如何工作的。如果你的希望的了解显卡如何使用工作年龄前（含VGA文本模式和颜色/属性的字节），那么肯定发疯。要知道，现代的操作系统不使用VGA文本模式，这样你就不会甚至学习在现代计算机引擎盖下会发生什么。

It's generally good to have some idea how things work under the hood. If you want to learn about how graphics used to work ages ago (with VGA text mode and color / attribute bytes), then sure, go nuts. Just be aware that modern OSes don't use VGA text mode, so you aren't even learning what happens under the hood on modern computers.

您正在一个完全不正确的方法来使用内联ASM。你似乎想在asm里写全的功能，所以你应该只是做的是的。例如把你的code在 asmfuncs.S 或东西。使用 .S 如果你想使用GNU / AT＆放大器保持; T语法;或者，如果你想使用英特尔/ NASM / YASM语法（这也是我建议，既然官方手册全部使用英特尔的语法使用 .ASM 。见的维基指南和手册）。

You are taking a totally incorrect approach to using inline ASM. You seem to want to write whole functions in asm, so you should just do that. e.g. put your code in asmfuncs.S or something. Use .S if you want to keep using GNU / AT&T syntax; or use .asm if you want to use Intel / NASM / YASM syntax (which I would recommend, since the official manuals all use Intel syntax. See the x86 wiki for guides and manuals.)

GNU内联汇编是的最难的方式来学习ASM 。你必须明白一切，你的ASM不会，什么编译器需要了解它。这真的很难得到的一切权利。例如，在你的编辑，内联汇编的该块修改的寄存器，你不列为被破坏，包括％EBX 这是一个call- preserved寄存器（所以这被打破，即使该功能没有内联）。至少你拿出 RET ，这样的事情就不会为壮观打破当编译器内联该函数将调用它的循环。如果这听起来非常复杂，这是因为它是和为什么您不应该使用内联汇编学习的一部分ASM

GNU inline asm is the hardest way to learn ASM. You have to understand everything that your asm does, and what the compiler needs to know about it. It's really hard to get everything right. For example, in your edit, that block of inline asm modifies many registers that you don't list as clobbered, including %ebx which is a call-preserved register (so this is broken even if that function isn't inlined). At least you took out the ret, so things won't break as spectacularly when the compiler inlines this function into the loop that calls it. If that sounds really complicated, that's because it is, and part of why you shouldn't use inline asm to learn asm.

This同时，努力学习ASM在大约有内联汇编以及如何使用它以及更多的链接首先从滥用内联汇编回答过类似的问题。

This answer to a similar question from misusing inline asm while trying to learn asm in the first place has more links about inline asm and how to use it well.

这部分可能是一个独立的答案，但我会离开它在一起。

This part could be a separate answer, but I'll leave it together.

除了你的整个方法是从根本上是一个坏主意，但至少有一个特定问题与 PUT_CHAR 功能：使用偏移作为一个只输出操作数。 GCC相当愉快编译你的整个功能单一的 RET 指令，因为汇编语句不是挥发性，其未使用的输出。（无输出内联汇编语句被认为是挥发性）

Besides your whole approach being fundamentally a bad idea, there is at least one specific problem with your put_char function: you use offset as an output-only operand. gcc quite happily compiles your whole function to a single ret instruction, because the asm statement isn't volatile, and its output isn't used. (Inline asm statements without outputs are assumed to be volatile.)

我，所以我可以看看什么是汇编编译器生成周围。这种联系是固定的，也许工作的版本，以正确申报，则会覆盖掉，评论，清理和优化。请参阅以下相同code，如果外部链接永远打破。

I put your function on godbolt, so I could look at what assembly the compiler generates surrounding it. That link is to the fixed maybe-working version, with correctly-declared clobbers, comments, cleanups, and optimizations. See below for the same code, if that external link ever breaks.

我用gcc 5.3与 -m16 选项，这是使用一个真正的16位编译器不同。它仍然是所有32位的方式（使用32位地址，32位 INT s和堆栈上的32位函数参数），但告诉编译器，该CPU将在16位模式，所以它会知道什么时候发出操作数的大小和地址大小prefixes。

I used gcc 5.3 with the -m16 option, which is different from using a real 16bit compiler. It still does everything the 32bit way (using 32bit addresses, 32bit ints, and 32bit function args on the stack), but tells the assembler that the CPU will be in 16bit mode, so it will know when to emit operand-size and address-size prefixes.

即使你，编译单位计算偏移量=（Y＆LT;＆LT; 8）+（Y＆LT;＆LT; 6）+ X; ，但不会将其放在％EDI ，因为你没有要求它。指定它作为其他输入操作数会工作。内联汇编后，它存储％EDI 到 -12（％EBP），其中偏移生活。

Even if you compile your original version with -O0, the compiler computes offset = (y<<8) + (y<<6) + x;, but doesn't put it in %edi, because you didn't ask it to. Specifying it as another input operand would have worked. After the inline asm, it stores %edi into -12(%ebp), where offset lives.

其他的东西错 PUT_CHAR ：

您传递两件事情（ ascii_char 和 current_color ）到通过全局的功能，而不是函数的参数。呸，那是恶心。 VGA 和字符是常数，因此，从全局加载它们看起来不那么糟糕。在ASM写意味着你应该忽视良好的编码习惯，只有当它通过一个合理的性能有所帮助。由于主叫方可能有这些值存储到全局，相比主叫存储他们堆栈函数参数上你不保存任何。而对于X86-64，你会失去PERF因为呼叫者可以只通过他们在寄存器中。

You pass 2 things (ascii_char and current_color) into your function through globals, instead of function arguments. Yuck, that's disgusting. VGA and characters are constants, so loading them from globals doesn't look so bad. Writing in asm means you should ignore good coding practices only when it helps performance by a reasonable amount. Since the caller probably had to store those values into the globals, you're not saving anything compared to the caller storing them on the stack as function args. And for x86-64, you'd be losing perf because the caller could just pass them in registers.

还有：

j,h,l,i=0;  // sets i=0, does nothing to j, h, or l.
       // gcc warns: left-hand operand of comma expression has no effect
j;h;l;i=0;  // equivalent to this

j=h=l=i=0;  // This is probably what you meant

所有的局部变量是未使用反正比其他偏移。是你打算把它写在C或东西吗？

All the local variables are unused anyway, other than offset. Were you going to write it in C or something?

您使用 16位地址字，但对于VGA内存32位寻址模式。我认为这是故意的，但我不知道，如果它是正确的。另外，你确定你应该使用 CS：倍率负载从字符？请问 .RODATA 部分进入code段？虽然你没有申报 uint8_t有字[464] 为常量，所以它可能只是在。数据部分反正。我认为自己是幸运的，我还没有真正写入code的分段存储器模式，但仍然看起来可疑。

You use 16bit addresses for characters, but 32bit addressing modes for VGA memory. I assume this is intentional, but I have no idea if it's correct. Also, are you sure you should use a CS: override for the loads from characters? Does the .rodata section go into the code segment? Although you didn't declare uint8_t characters[464] as const, so it's probably just in the .data section anyway. I consider myself fortunate that I haven't actually written code for a segmented memory model, but that still looks suspicious.

如果你真的使用DJGPP，然后根据迈克尔·佩奇的评论，您code将在32位模式下运行即可。使用16位地址，因此是一个坏主意。

If you're really using djgpp, then according to Michael Petch's comment, your code will run in 32bit mode. Using 16bit addresses is thus a bad idea.

您可以尽量避免使用％EBX 完全是做，而不必加载到EBX，再加入％EBX 来％EDI 。

You can avoid using %ebx entirely by doing, instead of loading into ebx and then adding %ebx to %edi.

 "add    _VGA, %%edi\n\t" // edi points to VGA + offset.

您不需要 LEA 来获得一个地址到寄存器。你可以只用

You don't need lea to get an address into a register. You can just use

    "mov    %%ax, %%si\n\t"
    "add    $_characters, %%si\n\t"

$ _字符表示该地址是一个立即数。我们可以通过这跟的的previous计算偏移量字符的位图阵列合成节省大量的指令。的 IMUL 立即操作数形式让我们产生结果％SI 首先：

$_characters means the address as an immediate constant. We can save a lot of instructions by combining this with the previous calculation of the offset into the characters array of bitmaps. The immediate-operand form of imul lets us produce the result in %si in the first place:

    "movzbw _ascii_char,%%si\n\t"
       //"sub    $32,%%ax\n\t"      // AX = ascii_char - 32
    "imul   $7, %%si, %%si\n\t"
    "add    $(_characters - 32*7), %%si\n\t"  // Do the -32 at the same time as adding the table address, after multiplying
    // SI points to characters[(ascii_char-32)*7]
    // i.e. the start of the bitmap for the current ascii character.

由于这种形式的 IMUL 只保存16 * 16的低16B - > 32B乘法，的的，这就是为什么只有 IMUL （不是 MUL ）的这些额外的形式。对于较大的操作数大小相乘，2和3操作数 IMUL ，因为它不具有的高一半存放在％[呃] DX 。

Since this form of imul only keeps the low 16b of the 16*16 -> 32b multiply, the 2 and 3 operand forms imul can be used for signed or unsigned multiplies, which is why only imul (not mul) has those extra forms. For larger operand-size multiplies, 2 and 3 operand imul is faster, because it doesn't have to store the high half in %[er]dx.

您可以简化内环一点，但它会外循环稍微复杂：你可以在零标志分支，如 SHL设置$ 1％人 ，而不是使用一个计数器。这将使联合国也predictable，像跳过去商店非前景像素，所以提高分支误predictions可能比额外的无为循环差。这也意味着你需要在每次外循环重新计算％EDI ，因为内循环将无法运行的时候数量不变。但它可能看起来像：

You could simplify the inner loop a bit, but it would complicate the outer loop slightly: you could branch on the zero flag, as set by shl $1, %al, instead of using a counter. That would make it also unpredictable, like the jump over store for non-foreground pixels, so the increased branch mispredictions might be worse than the extra do-nothing loops. It would also mean you'd need to recalculate %edi in the outer loop each time, because the inner loop wouldn't run a constant number of times. But it could look like:

    ... same first part of the loop as before
    // re-initialize %edi to first_pixel-1, based on outer-loop counter
    "lea  -1(%%edi), %%ebx\n"
    ".Lbit_loop:\n\t"      // map the 1bpp bitmap to 8bpp VGA memory
        "incl   %%ebx\n\t"       // inc before shift, to preserve flags
        "shl    $1,%%al\n\t"
        "jnc    .Lskip_store\n\t"   // transparency: only store on foreground pixels
        "movb   %%dl,(%%ebx)\n"  //plot the pixel
    ".Lskip_store:\n\t"
        "jnz  .Lbit_loop\n\t"    // flags still set from shl

        "addl   $320,%%edi\n\t"  // WITHOUT the -6
        "dec    %%cl\n\t"
        "jnz  .Lbyte_loop\n\t"

请注意，在你的性格位图位将映射到内存VGA像字节 {7 6 5 4 3 2 1 0} ，因为你正在测试该位通过的左的转变移出。所以它开始从MSB。在寄存器位始终为大端。左移乘以2，即使是小端机一样的x86上。小端只影响的字节序的内存，一个字节而不是位，甚至没有内部寄存器的字节。

Note that the bits in your character bitmaps are going to map to bytes in VGA memory like {7 6 5 4 3 2 1 0}, because you're testing the bit shifted out by a left shift. So it starts with the MSB. Bits in a register are always "big endian". A left shift multiplies by two, even on a little-endian machine like x86. Little-endian only affects ordering of bytes in memory, not bits in a byte, and not even bytes inside registers.

这是一样的godbolt链路

This is the same as the godbolt link.

void put_char(int x,int y){
    int offset = (y<<8) + (y<<6) + x;
    __asm__ volatile (  // volatile is implicit for asm statements with no outputs, but better safe than sorry.

        "add    _VGA, %%edi\n\t" // edi points to VGA + offset.

        "movzbw _ascii_char,%%si\n\t"
        //"sub    $32,%%ax\n\t"      // AX = ascii_char - 32
        "imul   $7, %%si, %%si\n\t"     // can't fold the load into this because it's not zero-padded
        "add    $(_characters - 32*7), %%si\n\t"  // Do the -32 at the same time as adding the table address, after multiplying
        // SI points to characters[(ascii_char-32)*7]
        // i.e. the start of the bitmap for the current ascii character.

        "mov    $7,%%cl\n"

        ".Lbyte_loop:\n\t"
            "lodsb  %%cs:(%%si)\n\t"   //load next byte of bitmap 

            "mov    $6,%%ch\n"
        ".Lbit_loop:\n\t"      // map the 1bpp bitmap to 8bpp VGA memory
            "shl    $1,%%al\n\t"
            "jnc    .Lskip_store\n\t"   // transparency: only store on foreground pixels
            "movb   %%dl,(%%edi)\n"  //plot the pixel
        ".Lskip_store:\n\t"
            "incl   %%edi\n\t"
            "dec    %%ch\n\t"
            "jnz  .Lbit_loop\n\t"

            "addl   $320-6,%%edi\n\t"
            "dec    %%cl\n\t"
            "jnz  .Lbyte_loop\n\t"


        : 
        : "D" (offset), "d" (current_color)
        : "%eax", "%ecx", "%esi", "memory"
         // omit the memory clobber if your C never touches VGA memory, and your asm never stores anywhere else.

    );
}

我没有使用虚拟输出操作数离开寄存器分配到编译器的自由裁量权，但是这是一个好主意，以减少在正确的地方为内联汇编获取数据的开销。（额外 MOV 指令）。例如，这里没有必要强制编译器把偏移在％EDI 。它可能是我们没有使用任何寄存器。

I didn't use dummy output operands to leave register allocation up to the compiler's discretion, but that's a good idea to reduce the overhead of getting data in the right places for inline asm. (extra mov instructions). For example, here there was no need to force the compiler to put offset in %edi. It could have been any register we aren't already using.

这篇关于在绘图内存VGA一个字符用GNU C内联汇编的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！