本文介绍了现代PC视频硬件是否在硬件中支持VGA文本模式,或者BIOS会模拟它(使用系统管理模式)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当您将诸如'1'(0x31)之类的字节存储到物理线性地址B8000上的帧缓冲区? 对于该区域设置为UC的 MTRR mov [es:di], eax存储有多慢? (表明WC上的clflushopt速度与VGA内存的UC大致相同,但是如果没有clflushopt,则mov存储到WC内存就永远不会离开CPU,根本不更新屏幕,运行速度非常快.)

What really happens on modern PC hardware booted in 16-bit legacy BIOS MBR mode when you store a byte such as '1' (0x31) into the VGA text (mode 03) framebuffer at physical linear address B8000? How slow is a mov [es:di], eax store with the MTRR for that region set to UC? (Experimental testing on one Kaby Lake iGPU laptop indicates that clflushopt on WC was roughly the same speed as UC for VGA memory. But without clflushopt, mov stores to WC memory never leave the CPU and don't update the screen at all, running super fast.)

如果它不是每个商店的SMI,是否有任何方法可以在用户空间中的WB内存块上近似估算此成本,以进行性能实验而无需实际重新启动为实模式? (例如,使用BSS页面作为伪装的帧缓冲区,但实际上不会在任何地方显示).

If it's not an SMI for every store, is there any way to approximate this cost on a chunk of WB memory in user-space, for performance experiments without actually rebooting into real mode? (e.g. using a BSS page as a pretend framebuffer that doesn't actually display anywhere).

相应的字形在下一次刷新时出现在屏幕上,但是硬件扫描真的从VRAM(或iGPU的DRAM)中读取ASCII字符并动态映射到位图字形吗?还是在每个商店或每个vblank上有一些软件拦截,所以真正的硬件仅需要处理位图帧缓冲区?

The corresponding font glyph appears on screen in the next refresh, but is hardware scan-out really reading that ASCII char from VRAM (or DRAM for an iGPU) and mapping to bitmap font glyphs on the fly? Or is there some software interception on each store or once per vblank so the real hardware only has to handle a bitmapped framebuffer?

旧版BIOS引导是众所周知的使用系统管理模式(SMM)进行仿真USB kbd/鼠标作为PS/2设备.我想知道它是否也用于VGA文本模式帧缓冲区.我假设它是 用于VGA I/O端口以进行模式设置,但是硬件可以支持文本帧缓冲区是合理的.但是,大多数计算机将所有时间都花在图形模式上,因此抛弃对文本模式的硬件支持似乎是厂商可能想要做的事情. (OTOH 此博客建议使用自制的Verilog VGA控制器可以相当简单地实现文本模式.)

Legacy BIOS booting is well known to use System Management Mode (SMM) to emulate USB kbd/mouse as a PS/2 devices. I'm wondering if it's also used for the VGA text mode framebuffer. I assume it is used for VGA I/O ports for mode-setting but it's plausible that a text framebuffer could be supported by hardware. However, most computers spend all their time in graphics mode so leaving out HW support for text mode seems like something vendors might want to do. (OTOH this blog suggests that a homebrew verilog VGA controller can implement text mode fairly simply.)

我对使用Intel Skylake中的iGPU的系统特别感兴趣,,但对Intel/AMD的早期/以后的iGPU,以及新的或旧的离散GPU感兴趣.

I'm specifically interested in systems using the iGPU in Intel Skylake, but would be interested in earlier / later iGPUs from Intel and AMD, and new or old discrete GPUs.

(包括AMD和NVidia以外的供应商;有一些Skylake主板带有PCI插槽,而不是PCIe.如果现代GPU固件驱动程序确实模拟文本模式,则大概是一些旧的PCI视频卡具有硬件VGA文本模式.也许这样的卡可以使商店仅是PCI交易,而不是SMI.)

(Including vendors other than AMD and NVidia; there are some Skylake motherboards with PCI slots, not PCIe. If modern GPU firmware drivers do emulate text mode, presumably there are some old PCI video cards with hardware VGA text mode. And maybe such a card could make stores just be a PCI transaction instead of an SMI.)

我自己的台式机是Asus Z170 Pro游戏主板中的i7-6700k,只有DVI-D输出上具有1920x1200显示器的iGPU,没有附加卡.我不知道Kaby Lake i5-7300HQ系统的详细信息@Eldan仅在CPU型号上进行测试.

My own desktop is an i7-6700k in an Asus Z170 Pro Gaming mobo, no add-on cards just iGPU with a 1920x1200 monitor on the DVI-D output. I don't know the details of the Kaby Lake i5-7300HQ system @Eldan is testing on, only the CPU model.

我发现 Phoenix BIOS自2011年以来的专利US20120159520 使用uefi模拟旧版视频.他们不要求视频硬件供应商同时提供UEFI 本机16位实模式选项ROM驱动程序,而是提出了一种实模式VGA驱动程序(int 10h功能等),该方法调用供应商通过SMM挂钩提供的UEFI视频驱动程序.

I found Phoenix BIOS's patent US20120159520 from 2011,Emulating legacy video using uefi. Instead of requiring video hardware vendors to supply both UEFI and native 16-bit real mode option-ROM drivers, they propose a real-mode VGA driver (int 10h functions and so on) that calls a vendor-supplied UEFI video driver via SMM hooks.

大部分描述涵盖了处理int 10h调用和类似已经明显地通过IVT捕获的调用的内容,因此可以轻松地运行有意触发SMI的自定义代码.相关部分是它们描述的直接存储到文本模式帧缓冲区中的内容,即使对于不触发任何软件或硬件中断的代码,它们也需要工作. (除了硬件在此类商店上触发SMI之外,他们表示可以在支持的情况下使用它们.)

Much of the description covers handling int 10h calls and stuff like that which already obviously trap through the IVT, thus can easily run custom code that triggers an SMI on purpose. The relevant part is what they describe for direct stores into the text-mode framebuffer which need to work even for code that doesn't trigger any software or hardware interrupts. (Other than HW triggering SMI on such stores, which they say they can use if supported.)

[0066]在某些实施例中,应用程序可以操纵VGA的 文字缓冲区直接.在这样的实施例中,通用视频SMM驱动器 130支持以下两种方式之一,具体取决于硬件是否 在对740 KB-768 KB内存的读/写访问上提供SMI陷阱 区域(文本缓冲区所在的位置).

[0066] In certain embodiments, applications may manipulate the VGA's text buffer directly. In such an embodiment, generic video SMM driver 130 support this in one of two ways, depending on whether the hardware provides SMI trapping on read/write access to the 740 KB-768 KB memory region (where the text buffers are located).

[0067]当SMI陷阱可用时,硬件将生成一个SMI 在每个读或写访问上.使用SMI陷阱的陷阱地址, 可以计算出确切的文本列和行,并且相应的 在虚拟文本屏幕中访问的行和列.

[0067] When SMI trapping is available, the hardware generates an SMI on each read or write access. Using the trap address of the SMI trap, the exact text column and row may be calculated and the corresponding row and column in the virtual text screen accessed.

或者, 为此区域启用了正常内存,并使用定期SMI, 通用视频SMM驱动程序130扫描仿真的更改 硬件文本缓冲区并更新相应的虚拟文本屏幕 由视频驱动程序维护.在这两种情况下, 检测到该字符后,将在虚拟文本屏幕上重新绘制该字符.

Alternately, normal memory is enabled for this region and, using a periodic SMI, generic video SMM driver 130 scans for changes in the emulated hardware text buffer and updates the corresponding virtual text screen maintained by the video driver. In both cases, when a change is detected, the character is redrawn on the virtual text screen.

这只是BIOS供应商的一项专利,并没有告诉我们大多数硬件的实际工作方式,或者其他供应商是否做不同的事情.实际上,它确实确认了一些硬件存在,但它们可能会在该范围内的商店中受困. (除非那只是他们决定在其专利中涵盖的一种假设可能性.)

This is just one BIOS vendor's patent, and doesn't tell us which way most hardware actually works, or if other vendors do different things. It does essentially confirm that some hardware exists which can trap on stores in that range, though. (Unless that's just a hypothetical possibility that they decided to cover in their patent.)

对于我所想到的用例,仅在屏幕刷新上进行捕获要比在每个商店上进行捕获快得多,因此我很好奇哪种硬件/固件以哪种方式工作.

For the use-case I have in mind, trapping only on screen refresh would be vastly faster than trapping on every store so I'm curious which hardware / firmware works which way.

优化第7代Intel Core处理器在视频RAM中增加ASCII十进制计数器-将ASCII文本计数器的新数字重复存储到视频RAM的相同字节中.

Optimizing an incrementing ASCII decimal counter in video RAM on 7th gen Intel Core - repeatedly storing new digits for an ASCII text counter into the same few bytes of video RAM.

我在Linux下的WB存储器上的32位用户空间中测试了该代码的版本,希望通过movnti以及使用不同的方法来使情况近似,以使CPU在每次运行后将其WC缓冲区同步到视频RAM存储(或者有时在计时器中断中).但是,如果实模式引导加载程序情况不只是存储到DRAM中,而是触发SMI,那么这是不现实的.

I tested a version of the code in 32-bit user-space under Linux, on WB memory, hoping to approximate the situation with movnti and different ways of getting the CPU to sync its WC buffer to video RAM after each store (or perhaps occasionally in a timer interrupt). But this is not realistic if the real-mode bootloader situation isn't just storing to DRAM, but instead triggering an SMI.

在WB内存中,使用lock xor byte [esp], 0刷新movnti存储比使用clflushopt刷新快一些.但是@Eldan报告说,在对MTRR进行编程以使其成为WC之后,VGA内存上的速度没有任何提高. (并且速度与原始普通存储的速度相同,这表明默认情况下VGA帧缓冲为UC.某些较早的BIOS 有一个选项可以制作VGA内存WC ,他们将其称为USWC =未缓存的推测写合并.)

On WB memory, flushing movnti stores with a lock xor byte [esp], 0 is somewhat faster than flushing with clflushopt. But @Eldan reports no speed improvement for those on VGA memory after programming an MTRR to make it WC. (And the same speed as for the original doing normal stores, indicating that by default the VGA framebuffer was UC. Some older BIOSes had an option to make VGA memory WC, which they called USWC = Uncached Speculative Write Combining.)

这不是现实问题,因此我不需要实际解决方法;尽管有趣的是,手动将像素字节存储到VGA图形模式是否会更快.

It's not a real-world problem so I'm not looking for actual workarounds; although it would be interesting to know if manually storing pixel bytes into a VGA graphics mode could be much faster.

  1. 是否有/所有真正的现代系统在每个商店上触发文本模式帧缓冲区的SMI?
  2. 如果否,是否可以使用movnti + WB内存中用户空间中的某些内容将WC store + clflush近似于帧缓冲区?因此,我们可以轻松地使用perf配置性能计数器.
  3. 如果不同的BIOS和/或硬件使用不同的策略,那么这些策略是什么? (我不想要细节,只是一个高层次的信息,例如每个vblank SMI将VGA帧缓冲区同步到实际的硬件帧缓冲区")
  4. 具有硬件VGA文本模式的PCIe或PCI视频卡是否会比集成GPU实际运行的速度更快?我猜想,实际的PCIe写入事务要比等待存储进入DRAM的速度慢,但PCIe写入要比每个商店的SMI便宜.大致比较/数量级比较有趣.
  1. Do any / all real modern systems trigger an SMI on every store to the text-mode framebuffer?
  2. If no, can we approximate a WC store+clflush to the framebuffer, using a movnti + something in user-space on WB memory? So we can easily profile with perf for performance counters.
  3. If different BIOSes and/or hardware use different strategies, what are those strategies? (I don't want details, just a high level like "SMI every vblank to sync the VGA framebuffer to the actual hardware framebuffer")
  4. Would a PCIe or PCI video card with hardware VGA textmode be faster than whatever integrated GPUs actually do? I'm guessing an actual PCIe write transaction would be slower than waiting for a store to hit DRAM, but that a PCIe write would be cheaper than an SMI on every store. A ballpark / order of magnitude comparison would be interesting.

这些问题都是高度相关的,但是如果没有我期望的那么多重叠,我可以将其分解.

These questions are all highly related, but I can split this up if there isn't as much overlap as I expect.

推荐答案

对于视频卡,我非常怀疑.自1980年代以来,视频卡制造商就一直在硬件中内置从char + attribute获取像素数据"逻辑(它早于VGA,自CGA以来并没有太大变化),只需将该逻辑剪切并粘贴到每个较新的设计中,而无需多加照顾关于它.

For video cards, I very much doubt it. Video card manufacturers have had the "get pixel data from char+attribute" logic built into hardware since the 1980s (it predates VGA and hasn't changed much since CGA), and just cut&paste that logic into each newer design without caring much about it.

对于根本不是视频卡的事物(例如,使用LAN的远程系统管理工具),我不知道但也不怀疑(通常它们使用特殊的管理CPU而不是主CPU,因此即使计算机关闭").

For things that are not video cards at all (e.g. remote system management tools using LAN) I don't know but suspect not (often they use a special management CPU rather than the main CPU/s so that it works even if the computer is turned "off").

如果您不在用户空间中,则可以更改MTTR(在所有CPU上-MTRR必须匹配,并且涉及特殊序列),以使RAM区域未占用";或在页表中使用PAT(比弄乱MTRR容易得多,尤其是如果您仍在使用分页,但由于仍需要缓存一致性而导致行为略有不同).如果您位于用户空间中,那么您将不得不依靠操作系统/内核提供的任何功能,并且(取决于操作系统是哪个)操作系统/内核可能根本不提供任何方式来实现此目的.

If you're not in user-space, you can change MTTRs (on all CPUs - MTRRs must match and there's a special sequence involved) to make an area of RAM "uncached"; or use PAT in the page tables (much easier than messing with MTRRs, especially if you're using paging anyway, but slightly different behavior due to still needing cache coherency). If you are in user-space then you will have to rely on whatever the OS/kernel provides, and (depending on which OS it is) the OS/kernel may not provide any way to do this at all.

但是;即使您找到一种使RAM的一部分(未缓存)的方法也不会非常相似,因为您将直接写入与CPU内置的内存控制器相连的内容(CPU可以非常快速地进行写入),而不是与PCI链接另一端的内容进行对话(这将导致更高的延迟和CPU端的较低带宽).即使对于集成视频(最终在技术上是相同的RAM芯片),对VRAM的写入也将经历非常不同的路径(受写入模式" VGA寄存器的影响,受视频卡中的重新映射/GART/分页的影响),位/平面掩码VGA寄存器等).

However; even if you find a way to make (an area of) RAM uncached it still won't be very similar, because you'll be writing directly to something attached to a memory controller built into the CPU (that CPU can write to extremely quickly) instead of talking to something at the other end of a PCI link (that will have higher latency and lower bandwidth from CPU's side). Even for integrated video (where it's technically the same RAM chips in the end) writes to VRAM go through a very different path (subject to remapping/GART/paging in the video card, effected by a "write mode" VGA register, effected by bit/plane mask VGA registers, etc).

用于从CPU到VRAM的写入;通常,集成视频要比分立卡快得多(至少对于从CPU到线性帧缓冲区的纯写操作,其中不涉及VGA的写逻辑").

For writes from CPU to VRAM; typically integrated video is significantly faster than discrete cards (at least for plain writes from CPU to linear frame buffers where none of the VGA's "write logic" is involved).

对于极其粗略的估算;我希望对RAM的单次写入大约为150个周期,而对PCI的单次写入接近1000个周期.对于SMI,我希望在SMI到达CPU之前有几百个周期的延迟,然后是CPU管道刷新的成本,然后是大约500个周期来保存CPU的状态(以及返回路径上的相同加载状态).那么固件的代码必须先找到SMI的原因(另外几百个周期?),然后才能知道这是对VRAM的写操作,而不是其他操作.那么它就必须检查保存的CPU状态并找到并解码进行写入的指令(因为它无法知道正在写入什么数据,如果是字节/字/双字写入等).记录先前的CPU状态(CPU处于哪种模式,代码大小等),并跟踪模拟指令如何影响未来的CPU状态(高级RIP等)-不要忘记,他们将模拟可能导致错误的每条指令写,包括诸如XADD之类的东西).接下来,它必须分析(模拟的)VGA寄存器的状态(写模式,写掩码,平面使能,以及控制将哪个64 KiB库映射到旧版区域,字体高度等).基本上;用于SMI仿真写入文本模式的帧缓冲区;我希望它花费数万个周期,然后固件的代码会忽略大量复杂性中掩盖的次要但重要的细节,从而导致它做错了事并且无法使用.

For extremely rough ballpark estimates; I'd expect a single write to RAM to be around 150 cycles and a single write to PCI to be close to 1000 cycles. For SMI I'd expect a few hundred cycles of latency before SMI arrives at CPU, then the cost of CPU pipeline flush, then about 500 cycles to save CPU's state (and same loading state on the return path); then the firmware's code would have to find the cause of the SMI (another few hundred cycles?) before it could know it was a write to VRAM and not something else; then it'd have to examine the saved CPU state and find and decode the instruction that made the write (because it can't know what data was being written, if it was a byte/word/dword write, etc) while taking into account previous CPU state (which mode CPU was in, code size, etc) and keeping track of how emulating the instruction effects the future CPU state (advancing RIP, etc - don't forget that they'll be emulating every instruction that can cause a write, including things like XADD, etc). Next it would have to analyze the state of (emulated) VGA registers (write mode, write mask, plane enable, whatever controls which 64 KiB bank is mapped into the legacy area, font height, ...). Basically; for SMI emulation of a write to text mode frame buffer; I'd expect it to take tens of thousands of cycles before the firmware's code overlooks a minor but important detail buried among a huge amount of complexity, causing it to do the wrong thing and be unusably broken.

其他说明

我怀疑这是否曾经实施过,因为我怀疑它能否奏效.您可以使用旧版界面进行太多(常见和晦涩)的操作(例如,检测垂直刷新,设置非标准视频模式(例如模式X",摆弄显示开始")以实现流畅的滚动和/或页面翻转,请在VBE中使用"CRTC信息"来更改UEFI不支持且无法通过的视频计时等). UEFI的第三方视频驱动程序.

I doubt this was ever implemented, because I doubt it can ever work. There's far too many (common and obscure) things you can do with the legacy interfaces (e.g. detect vertical refresh, setup non-standard video modes like "mode X", fiddle with "display start" to implement smooth scrolling and/or page flipping, use "CRTC info" in VBE to alter video timings, etc) that isn't supported by UEFI and can't be done via. a third party video driver for UEFI.

相反,视频卡制造商花了大约10年的时间才开始提供UEFI驱动程序,UEFI固件使用旧版接口来模拟UEFI服务(通常在使用UEFI服务时会中断安全启动);直到几乎所有东西都是UEFI.

Instead, video card manufacturers didn't bother providing UEFI drivers for about 10 years and UEFI firmware used the legacy interface to emulate UEFI services (often breaking secure boot while they were at it); until almost everything was UEFI anyway.

我认为不是.我怀疑SMM可能与视频模糊相关的唯一事情是在早期启动期间(在OS之前)控制笔记本电脑(尤其是较旧的笔记本电脑,尤其是盖打开/关闭事件")中屏幕背光的亮度.接管.)

I assume not. The only thing vaguely related to video that I'd suspect SMM may be used for is controlling the brightness of the screen's backlight in laptops (especially for older laptops, and especially for "lid open/close events") during early boot (before OS takes over).

我仍然相信,(最终,在已经太长时间的混合BIOS + UEFI"过渡阶段之后),可以消除30多年累积的旧乱码(A20,VGA,PS/2,PIT,PIC等). ),这是硬件制造商(英特尔)一直/一直在推动UEFI普及的主要原因之一.

I still believe that the (eventual, after the already too long "hybrid BIOS+UEFI" transition phase) removal of 30+ years of accumulated legacy mess (A20, VGA, PS/2, PIT, PIC, ...) from hardware is one of the main reasons hardware manufacturers (Intel) are/have been pushing for UEFI adoption.

这篇关于现代PC视频硬件是否在硬件中支持VGA文本模式,或者BIOS会模拟它(使用系统管理模式)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-08 07:55
查看更多