本文介绍了从4.1版开始,OpenGL中的文本呈现技术是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于OpenGL中文本渲染的问题已经很多,例如:

但是主要讨论的是使用固定功能管线渲染带纹理的四边形.当然,着色器必须有更好的方法.

我并不真正在意国际化,我的大部分字符串都将是绘图刻度标签(日期和时间或纯数字).但是这些图将以屏幕刷新率重新渲染,并且可能会有很多文本(屏幕上不超过几千个字形,但足以使硬件加速的布局就不错了.)

使用现代OpenGL进行文本渲染的推荐方法是什么? (使用这种方法引用现有软件是很好的证据,证明它运行良好)

  • 接受例如位置和方向以及一个字符序列,并发出带纹理的四边形
  • 呈现矢量字体的几何着色器
  • 如上所述,但改用曲面细分着色器
  • 用于着色光栅化的计算着色器

解决方案

除非每个轮廓总共绘制十二个字符,否则绘制轮廓将保持不可行"状态,因为每个字符需要近似顶点曲率的顶点数量.尽管有替代方法可以评估像素着色器中的贝塞尔曲线,但这些方法难以抗锯齿(使用距离贴图纹理四边形很容易),并且在着色器中评估曲线的计算量仍然比所需的昂贵得多.

快速"与质量"之间的最佳权衡仍然是带有符号距离场纹理的纹理四边形.与使用普通的普通纹理四边形相比,它稍微慢一点,但幅度不大.另一方面,质量完全不同.结果确实令人惊叹,它可以尽快获得,并且发光等效果也很容易添加.另外,如果需要,可以将该技术很好地降级为较旧的硬件.

有关该技术,请参见著名的阀门纸. /p>

该技术在概念上类似于隐式曲面(元球等)的工作方式,尽管它不会生成多边形.它完全在像素着色器中运行,并将从纹理采样的距离作为距离函数.高于选定阈值(通常为0.5)的所有内容均为输入",其他所有内容均为输出".在最简单的情况下,在具有10年历史的不支持着色器的硬件上,将alpha测试阈值设置为0.5可以做到这一点(尽管没有特殊效果和抗锯齿).
如果要给字体增加一些粗细(仿粗体),则可以使用较小的阈值而无需修改一行代码(只需更改统一的"font_weight"即可).对于发光效果,只需将一个阈值以上的所有内容都视为进入",而将另一个(较小)阈值之上的所有内容都视为输出,但进入发光",并将两者之间的LERP视为.抗锯齿的工作原理与此类似.

通过使用8位有符号距离值而不是单个位,此技术在每个维度上将纹理贴图的有效分辨率提高了16倍(而不是使用黑白,而是使用了所有可能的阴影,因此我们拥有使用相同存储空间的信息的256倍).但是,即使您将放大倍数远远超过16倍,结果仍然看起来可以接受.长直线最终将变得有点摇摆,但不会出现典型的块状"采样伪像.

您可以使用几何体着色器生成点之外的四边形(减少总线带宽),但是说实话,增益是很小的. GPG8中所述的实例字符渲染也是如此.仅当您要绘制 lot 文本时,才分摊实例化的开销.在我看来,收益与增加的复杂性和不可降级性无关.另外,您可能会受到常量寄存器数量的限制,或者必须从纹理缓冲区对象中读取内容,该对象对于高速缓存一致性不是最佳的(目的是从头开始进行优化!).
如果您将上传时间安排得提前一点,那么简单的普通旧顶点缓冲区就一样快(可能更快),并且可以在过去15年中构建的所有硬件上运行.而且,它不仅限于字体中的任何特定数量的字符,也不限于要呈现的特定数量的字符.

如果您确定字体中的字符数不超过256个,那么与从几何体着色器中的点生成四边形的方式类似,可以考虑使用纹理阵列来剥离总线带宽.使用数组纹理时,所有四边形的纹理坐标具有相同的常量st坐标,并且仅在r坐标上有所不同,该坐标等于要渲染的字符索引.
但是,与其他技术一样,预期的收益是微不足道的,其代价是与上一代硬件不兼容.

Jonathan Dummer提供了一个方便的工具来生成距离纹理:说明页面

更新:
正如最近在 Programmable Vertex Pulling (D.Rákos,"OpenGL Insights",第239页)中指出的那样,与从着色器上的着色器以编程方式拉动顶点数据没有明显的额外延迟或开销.与使用标准固定功能进行比较相比,最新一代的GPU.
此外,最新一代的GPU具有越来越合理大小的通用L2高速缓存(例如nvidia Kepler上的1536kiB),因此当从缓冲区纹理中提取四角的随机偏移时,人们可能会期望出现不连贯的访问问题.问题.

这使从缓冲区纹理中提取恒定数据(例如四边形大小)的想法更具吸引力.因此,假设的实现可以通过以下方法将PCIe和内存传输以及GPU内存减少到最低程度:

  • 仅将字符索引(每个要显示的字符一个)上载为传递给该索引和gl_VertexID的顶点着色器的唯一输入,并将其放大到几何着色器中的4个点,仍然具有字符索引并将顶点ID(将是顶点着色器中提供的gl_primitiveID")作为唯一属性,并通过转换反馈来捕获它.
  • 这很快,因为只有两个输出属性(GS中的主要瓶颈),否则在两个阶段都接近无操作".
  • 绑定一个缓冲区纹理,该缓冲区纹理针对字体中的每个字符包含相对于基点的纹理四边形的顶点位置(这些基本上是字体度量").通过仅存储左下角顶点的偏移量并编码轴对齐框的宽度和高度(假设为半浮点数,这将是每个字符8字节的恒定缓冲区),可以将该数据压缩为每四方4个数字.典型的256个字符的字体可以完全容纳到2kiB的一级缓存中.)
  • 为基线设置制服
  • 绑定具有水平偏移的缓冲区纹理.这些可能甚至可以在GPU上计算出来,但是对于CPU上的这种事情来说,它要容易得多,并且效率更高,因为这是严格的顺序操作,一点也不琐碎(想想字距调整).此外,还需要另一个反馈传递,这将是另一个同步点.
  • 从反馈缓冲区渲染先前生成的数据,顶点着色器从缓冲区对象中提取基点的水平偏移和角顶点的偏移(使用原始ID和字符索引).现在,提交的顶点的原始顶点ID是我们的原始ID"(请记住,GS将顶点变成了四边形).
像这样,理想情况下可以将所需的顶点带宽减少75%(摊销),尽管它只能渲染一条直线.如果一个人希望能够在一个绘制调用中绘制多条线,则需要将基线添加到缓冲区纹理中,而不是使用统一的图形(使带宽增益变小).

但是,即使假设减少了75%,因为显示合理"数量的文本的顶点数据仅在50-100kiB左右(对于GPU或PCIe几乎为)总线)-我仍然怀疑增加复杂性和失去向后兼容性是否值得这样做.将零减少75%仍然仅是零.我承认我还没有尝试过上述方法,因此需要做更多的研究才能做出真正合格的陈述.但是,除非有人能够表现出真正令人惊叹的性能差异(使用正常"数量的文本,而不是数十亿个字符!),否则我的观点仍然是,对于顶点数据,一个简单的普通旧顶点缓冲区就足够了.被视为最新解决方案"的一部分.它简单明了,可以正常工作,而且效果很好.

上面已经提到了" OpenGL见解",因此还值得一提的是"2D 通过距离场进行形状渲染" (作者Stefan Gustavson),其中详细介绍了距离场渲染.

2016年更新:

同时,还存在一些其他技术,旨在消除在极端放大倍率下变得不安的拐角圆角伪像.

一种方法只是使用伪距离场而不是距离场(不同之处在于,该距离不是到实际轮廓线的最短距离,而是到轮廓线[em>或假想线的最短距离)边缘).这样会更好一些,并且使用相同数量的纹理内存以相同的速度(相同的着色器)运行.

另一种方法是在三通道纹理细节和实现中使用三位数中位数在github上可用.目的是对以前用于解决此问题的与"或黑"进行改进.好的质量稍微慢一点,几乎没有明显的慢,但是使用了三倍的纹理记忆.另外,很难获得额外的效果(例如发光).

最后,存储构成字符的实际贝塞尔曲线,并在片段着色器中对其进行评估.已变得实用,即使在最高放大倍数下,性能也略逊一筹(但问题不算太大),而且效果惊人.
此处.

There are already a number of questions about text rendering in OpenGL, such as:

But mostly what is discussed is rendering textured quads using the fixed-function pipeline. Surely shaders must make a better way.

I'm not really concerned about internationalization, most of my strings will be plot tick labels (date and time or purely numeric). But the plots will be re-rendered at the screen refresh rate and there could be quite a bit of text (not more than a few thousand glyphs on-screen, but enough that hardware accelerated layout would be nice).

What is the recommended approach for text-rendering using modern OpenGL? (Citing existing software using the approach is good evidence that it works well)

  • Geometry shaders that accept e.g. position and orientation and a character sequence and emit textured quads
  • Geometry shaders that render vector fonts
  • As above, but using tessellation shaders instead
  • A compute shader to do font rasterization

解决方案

Rendering outlines, unless you render only a dozen characters total, remains a "no go" due to the number of vertices needed per character to approximate curvature. Though there have been approaches to evaluate bezier curves in the pixel shader instead, these suffer from not being easily antialiased, which is trivial using a distance-map-textured quad, and evaluating curves in the shader is still computationally much more expensive than necessary.

The best trade-off between "fast" and "quality" are still textured quads with a signed distance field texture. It is very slightly slower than using a plain normal textured quad, but not so much. The quality on the other hand, is in an entirely different ballpark. The results are truly stunning, it is as fast as you can get, and effects such as glow are trivially easy to add, too. Also, the technique can be downgraded nicely to older hardware, if needed.

See the famous Valve paper for the technique.

The technique is conceptually similar to how implicit surfaces (metaballs and such) work, though it does not generate polygons. It runs entirely in the pixel shader and takes the distance sampled from the texture as a distance function. Everything above a chosen threshold (usually 0.5) is "in", everything else is "out". In the simplest case, on 10 year old non-shader-capable hardware, setting the alpha test threshold to 0.5 will do that exact thing (though without special effects and antialiasing).
If one wants to add a little more weight to the font (faux bold), a slightly smaller threshold will do the trick without modifying a single line of code (just change your "font_weight" uniform). For a glow effect, one simply considers everything above one threshold as "in" and everything above another (smaller) threshold as "out, but in glow", and LERPs between the two. Antialiasing works similarly.

By using an 8-bit signed distance value rather than a single bit, this technique increases the effective resolution of your texture map 16-fold in each dimension (instead of black and white, all possible shades are used, thus we have 256 times the information using the same storage). But even if you magnify far beyond 16x, the result still looks quite acceptable. Long straight lines will eventually become a bit wiggly, but there will be no typical "blocky" sampling artefacts.

You can use a geometry shader for generating the quads out of points (reduce bus bandwidth), but honestly the gains are rather marginal. The same is true for instanced character rendering as described in GPG8. The overhead of instancing is only amortized if you have a lot of text to draw. The gains are, in my opinion, in no relation to the added complexity and non-downgradeability. Plus, you are either limited by the amount of constant registers, or you have to read from a texture buffer object, which is non-optimal for cache coherence (and the intent was to optimize to begin with!).
A simple, plain old vertex buffer is just as fast (possibly faster) if you schedule the upload a bit ahead in time and will run on every hardware built during the last 15 years. And, it is not limited to any particular number of characters in your font, nor to a particular number of characters to render.

If you are sure that you do not have more than 256 characters in your font, texture arrays may be worth a consideration to strip off bus bandwidth in a similar manner as generating quads from points in the geometry shader. When using an array texture, the texture coordinates of all quads have identical, constant s and t coordinates and only differ in the r coordinate, which is equal to the character index to render.
But like with the other techniques, the expected gains are marginal at the cost of being incompatible with previous generation hardware.

There is a handy tool by Jonathan Dummer for generating distance textures: description page

Update:
As more recently pointed out in Programmable Vertex Pulling (D. Rákos, "OpenGL Insights", pp. 239), there is no significant extra latency or overhead associated with pulling vertex data programmatically from the shader on the newest generations of GPUs, as compared to doing the same using the standard fixed function.
Also, the latest generations of GPUs have more and more reasonably sized general-purpose L2 caches (e.g. 1536kiB on nvidia Kepler), so one may expect the incoherent access problem when pulling random offsets for the quad corners from a buffer texture being less of a problem.

This makes the idea of pulling constant data (such as quad sizes) from a buffer texture more attractive. A hypothetical implementation could thus reduce PCIe and memory transfers, as well as GPU memory, to a minimum with an approach like this:

  • Only upload a character index (one per character to be displayed) as the only input to a vertex shader that passes on this index and gl_VertexID, and amplify that to 4 points in the geometry shader, still having the character index and the vertex id (this will be "gl_primitiveID made available in the vertex shader") as the sole attributes, and capture this via transform feedback.
  • This will be fast, because there are only two output attributes (main bottleneck in GS), and it is close to "no-op" otherwise in both stages.
  • Bind a buffer texture which contains, for each character in the font, the textured quad's vertex positions relative to the base point (these are basically the "font metrics"). This data can be compressed to 4 numbers per quad by storing only the offset of the bottom left vertex, and encoding the width and height of the axis-aligned box (assuming half floats, this will be 8 bytes of constant buffer per character -- a typical 256 character font could fit completely into 2kiB of L1 cache).
  • Set an uniform for the baseline
  • Bind a buffer texture with horizontal offsets. These could probably even be calculated on the GPU, but it is much easier and more efficient to that kind of thing on the CPU, as it is a strictly sequential operation and not at all trivial (think of kerning). Also, it would need another feedback pass, which would be another sync point.
  • Render the previously generated data from the feedback buffer, the vertex shader pulls the horizontal offset of the base point and the offsets of the corner vertices from buffer objects (using the primitive id and the character index). The original vertex ID of the submitted vertices is now our "primitive ID" (remember the GS turned the vertices into quads).

Like this, one could ideally reduce the required vertex bandwith by 75% (amortized), though it would only be able to render a single line. If one wanted to be able to render several lines in one draw call, one would need to add the baseline to the buffer texture, rather than using an uniform (making the bandwidth gains smaller).

However, even assuming a 75% reduction -- since the vertex data to display "reasonable" amounts of text is only somewhere around 50-100kiB (which is practically zero to a GPU or a PCIe bus) -- I still doubt that the added complexity and losing backwards-compatibility is really worth the trouble. Reducing zero by 75% is still only zero. I have admittedly not tried the above approach, and more research would be needed to make a truly qualified statement. But still, unless someone can demonstrate a truly stunning performance difference (using "normal" amounts of text, not billions of characters!), my point of view remains that for the vertex data, a simple, plain old vertex buffer is justifiably good enough to be considered part of a "state of the art solution". It's simple and straightforward, it works, and it works well.

Having already referenced "OpenGL Insights" above, it is worth to also point out the chapter "2D Shape Rendering by Distance Fields" by Stefan Gustavson which explains distance field rendering in great detail.

Update 2016:

Meanwhile, there exist several additional techniques which aim to remove the corner rounding artefacts which become disturbing at extreme magnifications.

One approach simply uses pseudo-distance fields instead of distance fields (the difference being that the distance is the shortest distance not to the actual outline, but to the outline or an imaginary line protruding over the edge). This is somewhat better, and runs at the same speed (identical shader), using the same amount of texture memory.

Another approach uses the median-of-three in a three-channel texture details and implementation available at github. This aims to be an improvement over the and-or hacks used previously to address the issue. Good quality, slightly, almost not noticeably, slower, but uses three times as much texture memory. Also, extra effects (e.g. glow) are harder to get right.

Lastly, storing the actual bezier curves making up characters, and evaluating them in a fragment shader has become practical, with slightly inferior performance (but not so much that it's a problem) and stunning results even at highest magnifications.
WebGL demo rendering a large PDF with this technique in real time available here.

这篇关于从4.1版开始,OpenGL中的文本呈现技术是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 02:07