从 4.1 版开始，OpenGL 中文本渲染的最新技术是什么?

本文介绍了从 4.1 版开始，OpenGL 中文本渲染的最新技术是什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

已经有很多关于OpenGL中文本渲染的问题，比如:

There are already a number of questions about text rendering in OpenGL, such as:

如何为 GUI 进行 OpenGL 实时文本渲染?

但主要讨论的是使用固定功能管道渲染纹理四边形.当然，着色器必须有更好的方法.

But mostly what is discussed is rendering textured quads using the fixed-function pipeline. Surely shaders must make a better way.

我并不真正关心国际化，我的大部分字符串将是绘图刻度标签(日期和时间或纯数字).但是绘图将以屏幕刷新率重新渲染，并且可能会有相当多的文本(屏幕上不超过几千个字形，但足以让硬件加速布局很好).

I'm not really concerned about internationalization, most of my strings will be plot tick labels (date and time or purely numeric). But the plots will be re-rendered at the screen refresh rate and there could be quite a bit of text (not more than a few thousand glyphs on-screen, but enough that hardware accelerated layout would be nice).

使用现代 OpenGL 进行文本渲染的推荐方法是什么?(引用使用该方法的现有软件很好地证明它运行良好)

What is the recommended approach for text-rendering using modern OpenGL? (Citing existing software using the approach is good evidence that it works well)

接受例如几何着色器位置和方向以及字符序列并发出带纹理的四边形
渲染矢量字体的几何着色器
同上，但使用曲面细分着色器
用于字体光栅化的计算着色器

推荐答案

渲染轮廓，除非您总共只渲染十几个字符，否则由于每个字符需要的顶点数量来近似曲率，因此仍然无法进行".尽管有一些方法可以在像素着色器中评估贝塞尔曲线，但这些方法不容易抗锯齿，使用距离贴图纹理四边形是微不足道的，并且在着色器中评估曲线在计算上仍然比必要的要昂贵得多.

Rendering outlines, unless you render only a dozen characters total, remains a "no go" due to the number of vertices needed per character to approximate curvature. Though there have been approaches to evaluate bezier curves in the pixel shader instead, these suffer from not being easily antialiased, which is trivial using a distance-map-textured quad, and evaluating curves in the shader is still computationally much more expensive than necessary.

快速"和质量"之间的最佳折衷仍然是带有符号距离场纹理的纹理四边形.它比使用普通的普通纹理四边形要稍微慢，但不会慢很多.另一方面，质量完全不同.结果确实令人惊叹，速度很快，而且发光等效果也很容易添加.此外，如果需要，该技术可以很好地降级到旧硬件.

The best trade-off between "fast" and "quality" are still textured quads with a signed distance field texture. It is very slightly slower than using a plain normal textured quad, but not so much. The quality on the other hand, is in an entirely different ballpark. The results are truly stunning, it is as fast as you can get, and effects such as glow are trivially easy to add, too. Also, the technique can be downgraded nicely to older hardware, if needed.

有关该技术的信息，请参阅著名的阀门论文.

See the famous Valve paper for the technique.

该技术在概念上类似于隐式曲面(元球等)的工作方式，但它不会生成多边形.它完全在像素着色器中运行，并将从纹理采样的距离作为距离函数.高于所选阈值(通常为 0.5)的所有内容都是输入"，其他所有内容都输出".在最简单的情况下，在 10 年前的不支持着色器的硬件上，将 alpha 测试阈值设置为 0.5 就可以做到这一点(尽管没有特殊效果和抗锯齿).
如果你想给字体增加一点重量(人造粗体)，一个稍微小一点的阈值就可以做到这一点，而无需修改一行代码(只需更改你的font_weight"统一).对于辉光效果，人们只需将高于一个阈值的所有内容视为进入"，将高于另一个(较小)阈值的所有内容视为离开，但处于辉光状态"，以及两者之间的 LERP.抗锯齿的工作原理类似.

The technique is conceptually similar to how implicit surfaces (metaballs and such) work, though it does not generate polygons. It runs entirely in the pixel shader and takes the distance sampled from the texture as a distance function. Everything above a chosen threshold (usually 0.5) is "in", everything else is "out". In the simplest case, on 10 year old non-shader-capable hardware, setting the alpha test threshold to 0.5 will do that exact thing (though without special effects and antialiasing).
If one wants to add a little more weight to the font (faux bold), a slightly smaller threshold will do the trick without modifying a single line of code (just change your "font_weight" uniform). For a glow effect, one simply considers everything above one threshold as "in" and everything above another (smaller) threshold as "out, but in glow", and LERPs between the two. Antialiasing works similarly.

通过使用 8 位有符号距离值而不是单个位，此技术将纹理贴图在每个维度的有效分辨率提高了 16 倍(而不是黑色和白色，使用所有可能的阴影，因此我们有256 倍的信息使用相同的存储).但即使放大远远超过 16 倍，结果看起来仍然可以接受.长直线最终会变得有点摇摆不定，但不会有典型的块状"采样伪影.

By using an 8-bit signed distance value rather than a single bit, this technique increases the effective resolution of your texture map 16-fold in each dimension (instead of black and white, all possible shades are used, thus we have 256 times the information using the same storage). But even if you magnify far beyond 16x, the result still looks quite acceptable. Long straight lines will eventually become a bit wiggly, but there will be no typical "blocky" sampling artefacts.

您可以使用几何着色器从点中生成四边形(减少总线带宽)，但老实说，收益相当有限.对于 GPG8 中描述的实例化字符渲染也是如此.实例化的开销仅在您有 lot 的文本要绘制时才摊销.在我看来，这些收益与增加的复杂性和不可降级性无关.另外，您要么受到常量寄存器数量的限制，要么必须从纹理缓冲区对象中读取数据，这对于缓存一致性来说不是最佳的(并且目的是从一开始就进行优化！).
如果您提前一点安排上传，并且将在过去 15 年中构建的每个硬件上运行，那么简单、普通的旧顶点缓冲区也同样快(可能更快).而且，它不限于字体中的任何特定字符数，也不限于要呈现的特定字符数.

You can use a geometry shader for generating the quads out of points (reduce bus bandwidth), but honestly the gains are rather marginal. The same is true for instanced character rendering as described in GPG8. The overhead of instancing is only amortized if you have a lot of text to draw. The gains are, in my opinion, in no relation to the added complexity and non-downgradeability. Plus, you are either limited by the amount of constant registers, or you have to read from a texture buffer object, which is non-optimal for cache coherence (and the intent was to optimize to begin with!).
A simple, plain old vertex buffer is just as fast (possibly faster) if you schedule the upload a bit ahead in time and will run on every hardware built during the last 15 years. And, it is not limited to any particular number of characters in your font, nor to a particular number of characters to render.

如果您确定字体中的字符不超过 256 个，则可能值得考虑以类似于从几何着色器中的点生成四边形的方式剥离总线带宽的纹理数组.使用数组纹理时，所有四边形的纹理坐标具有相同的恒定s 和t 坐标，仅在r 坐标上有所不同，即等于要渲染的字符索引.
但与其他技术一样，预期收益微乎其微，代价是与上一代硬件不兼容.

If you are sure that you do not have more than 256 characters in your font, texture arrays may be worth a consideration to strip off bus bandwidth in a similar manner as generating quads from points in the geometry shader. When using an array texture, the texture coordinates of all quads have identical, constant s and t coordinates and only differ in the r coordinate, which is equal to the character index to render.
But like with the other techniques, the expected gains are marginal at the cost of being incompatible with previous generation hardware.

Jonathan Dummer 提供了一个用于生成距离纹理的便捷工具:描述页面

There is a handy tool by Jonathan Dummer for generating distance textures: description page

更新:
正如最近在 Programmable Vertex Pulling(D. Rákos，OpenGL Insights"，第 239 页)中指出的那样，以编程方式从着色器上的着色器中拉取顶点数据没有明显的额外延迟或开销.与使用标准固定功能执行相同操作相比，最新一代 GPU.
此外，最新一代的 GPU 拥有越来越多合理大小的通用 L2 缓存(例如 nvidia Kepler 上的 1536kiB)，因此当从缓冲区纹理中提取四角的随机偏移量时，人们可能会期望不连贯访问问题问题.

Update:
As more recently pointed out in Programmable Vertex Pulling (D. Rákos, "OpenGL Insights", pp. 239), there is no significant extra latency or overhead associated with pulling vertex data programmatically from the shader on the newest generations of GPUs, as compared to doing the same using the standard fixed function.
Also, the latest generations of GPUs have more and more reasonably sized general-purpose L2 caches (e.g. 1536kiB on nvidia Kepler), so one may expect the incoherent access problem when pulling random offsets for the quad corners from a buffer texture being less of a problem.

这使得从缓冲区纹理中提取恒定数据(例如四边形大小)的想法更具吸引力.因此，假设的实现可以通过以下方法将 PCIe 和内存传输以及 GPU 内存减少到最低限度:

This makes the idea of pulling constant data (such as quad sizes) from a buffer texture more attractive. A hypothetical implementation could thus reduce PCIe and memory transfers, as well as GPU memory, to a minimum with an approach like this:

仅上传字符索引(每个要显示的字符一个)作为传递此索引和 gl_VertexID 的顶点着色器的唯一输入，并在几何着色器中将其放大为 4 个点，仍然将字符索引和顶点 ID(这将是在顶点着色器中可用的 gl_primitiveID")作为唯一属性，并通过变换反馈捕获它.
这会很快，因为只有两个输出属性(GS 中的主要瓶颈)，否则在两个阶段都接近无操作".
绑定一个缓冲区纹理，其中包含字体中每个字符的纹理四边形相对于基点的顶点位置(这些基本上是字体度量").通过仅存储左下角顶点的偏移量，并编码轴对齐框的宽度和高度(假设半浮点数，这将是每个字符 8 个字节的常量缓冲区)，可以将此数据压缩为每个四边形 4 个数字——一个典型的 256 个字符的字体可以完全放入 2kiB 的 L1 缓存中).
为基线设置制服
绑定具有水平偏移量的缓冲区纹理.这些甚至可能在 GPU 上计算，但在 CPU 上进行这种计算要容易得多，效率也更高，因为它是严格的顺序操作，而且一点也不微不足道(想想字距调整).此外，它还需要另一个反馈通道，这将是另一个同步点.
从反馈缓冲区渲染之前生成的数据，顶点着色器从缓冲区对象中拉取基点的水平偏移量和角顶点的偏移量(使用图元id和字符索引).提交顶点的原始顶点 ID 现在是我们的原始 ID"(记住 GS 将顶点变成了四边形).

Only upload a character index (one per character to be displayed) as the only input to a vertex shader that passes on this index and gl_VertexID, and amplify that to 4 points in the geometry shader, still having the character index and the vertex id (this will be "gl_primitiveID made available in the vertex shader") as the sole attributes, and capture this via transform feedback.
This will be fast, because there are only two output attributes (main bottleneck in GS), and it is close to "no-op" otherwise in both stages.
Bind a buffer texture which contains, for each character in the font, the textured quad's vertex positions relative to the base point (these are basically the "font metrics"). This data can be compressed to 4 numbers per quad by storing only the offset of the bottom left vertex, and encoding the width and height of the axis-aligned box (assuming half floats, this will be 8 bytes of constant buffer per character -- a typical 256 character font could fit completely into 2kiB of L1 cache).
Set an uniform for the baseline
Bind a buffer texture with horizontal offsets. These could probably even be calculated on the GPU, but it is much easier and more efficient to that kind of thing on the CPU, as it is a strictly sequential operation and not at all trivial (think of kerning). Also, it would need another feedback pass, which would be another sync point.
Render the previously generated data from the feedback buffer, the vertex shader pulls the horizontal offset of the base point and the offsets of the corner vertices from buffer objects (using the primitive id and the character index). The original vertex ID of the submitted vertices is now our "primitive ID" (remember the GS turned the vertices into quads).

像这样，理想情况下可以将所需的顶点带宽减少 75%(摊销)，尽管它只能渲染一条线.如果希望能够在一次绘制调用中渲染多条线，则需要将基线添加到缓冲区纹理，而不是使用均匀(使带宽增益更小).

Like this, one could ideally reduce the required vertex bandwith by 75% (amortized), though it would only be able to render a single line. If one wanted to be able to render several lines in one draw call, one would need to add the baseline to the buffer texture, rather than using an uniform (making the bandwidth gains smaller).

然而，即使假设减少了 75%——因为显示合理"文本量的顶点数据也只有 50-100kiB 左右(对于 GPU 或 PCIe 来说实际上零bus)——我仍然怀疑增加的复杂性和失去向后兼容性是否真的值得麻烦.将零减少 75% 仍然只是零.诚然，我没有尝试过上述方法，需要更多的研究才能做出真正合格的陈述.但是，除非有人能够展示出真正惊人的性能差异(使用正常"数量的文本，而不是数十亿个字符！)，我的观点仍然是对于顶点数据，一个简单的、普通的旧顶点缓冲区就足够好了被视为最先进的解决方案"的一部分.它简单明了，行之有效，而且效果很好.

However, even assuming a 75% reduction -- since the vertex data to display "reasonable" amounts of text is only somewhere around 50-100kiB (which is practically zero to a GPU or a PCIe bus) -- I still doubt that the added complexity and losing backwards-compatibility is really worth the trouble. Reducing zero by 75% is still only zero. I have admittedly not tried the above approach, and more research would be needed to make a truly qualified statement. But still, unless someone can demonstrate a truly stunning performance difference (using "normal" amounts of text, not billions of characters!), my point of view remains that for the vertex data, a simple, plain old vertex buffer is justifiably good enough to be considered part of a "state of the art solution". It's simple and straightforward, it works, and it works well.

上面已经参考了OpenGL Insights"，值得一提的是2D"一章Shape Rendering by Distance Fields" 作者:Stefan Gustavson，其中详细解释了距离场渲染.

Having already referenced "OpenGL Insights" above, it is worth to also point out the chapter "2D Shape Rendering by Distance Fields" by Stefan Gustavson which explains distance field rendering in great detail.

2016 年更新:

与此同时，还有一些额外的技术旨在消除在极端放大倍数下变得令人不安的圆角伪影.

Meanwhile, there exist several additional techniques which aim to remove the corner rounding artefacts which become disturbing at extreme magnifications.

一种方法简单地使用伪距离场而不是距离场(不同之处在于距离不是到实际轮廓的最短距离，而是到轮廓或一条假想的线.边缘).这稍微好一些，并且以相同的速度运行(相同的着色器)，使用相同数量的纹理内存.

One approach simply uses pseudo-distance fields instead of distance fields (the difference being that the distance is the shortest distance not to the actual outline, but to the outline or an imaginary line protruding over the edge). This is somewhat better, and runs at the same speed (identical shader), using the same amount of texture memory.

另一种方法在三通道纹理细节和实现中使用三的中位数可在 github 获得.这旨在改进之前用于解决该问题的和/或黑客.质量好，稍微，几乎不明显，速度较慢，但使用了三倍的纹理内存.此外，额外的效果(例如发光)更难实现.

Another approach uses the median-of-three in a three-channel texture details and implementation available at github. This aims to be an improvement over the and-or hacks used previously to address the issue. Good quality, slightly, almost not noticeably, slower, but uses three times as much texture memory. Also, extra effects (e.g. glow) are harder to get right.

最后，存储构成字符的实际贝塞尔曲线，并在片段着色器中评估它们已变得实用，性能稍差(但不是问题)，即使在最高放大倍数下也能获得惊人的效果.
使用此技术实时渲染大型 PDF 的 WebGL 演示此处.

Lastly, storing the actual bezier curves making up characters, and evaluating them in a fragment shader has become practical, with slightly inferior performance (but not so much that it's a problem) and stunning results even at highest magnifications.
WebGL demo rendering a large PDF with this technique in real time available here.

这篇关于从 4.1 版开始，OpenGL 中文本渲染的最新技术是什么?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！