问题描述
我正在尝试在iOS上找到处理OpenGL ES2中多纹理的最有效方法。 高效我的意思是即使在较旧的iOS设备(iPhone 4及更高版本)上也能实现最快的渲染效果 - 而且还可以平衡便利性。
我已经考虑(并试过)几个不同的方法。但是遇到了一些问题和疑问。
方法1 - 我的基数和正常值是rgb,没有ALPHA。对于这些物体,我不需要透明度。我的发射和镜面反射信息都只是一个通道。为了减少 texture2D()
调用,我想我可以将发射存储为基础的alpha通道,将镜面反射存储为法线的alpha。每个都在他们自己的文件中它看起来像这样:
到目前为止,我的问题是找到一个支持完整的非预乘alpha通道的文件格式。 PNG只是没有为我工作。我试图将此作为PNG保存的每一种方式都将.alpha与.rgb文件保存(通过photoshop)基本上摧毁了.rgb。当我重新加载文件时,任何具有0.0 alpha的像素都有黑色rgb。我发布了这个问题(谢谢)我将这些偏移移动到顶点着色器并将它们保留在片段着色器之外。
但我的问题是:减少着色器中使用的纹理采样器的数量是否重要?或者我最好在这里使用4种不同的较小纹理?
我遇到的一个问题是在不同的地图之间流血了。由于线性纹理映射,1.0的texcoord在一些蓝色正常像素中进行平均。这在接缝附近的物体上添加了蓝色边缘。为了避免这种情况,我不得不将我的UV贴图更改为不太靠近边缘。这对很多物体来说都很痛苦。
方法3 将组合方法1和2.并拥有base.rgb一边是+ emission.a,另一边是normal.rgb + specular.a。但是我仍然有这个问题获得一个独立的alpha保存在文件中。
也许我可以将它们保存为两个文件,但在加载过程中将它们合并,然后再发送到openGL的。我必须尝试。
方法4 最后,在3D世界中,如果我有20种不同的墙面板纹理,那么这些是单个文件还是全部打包在单个纹理图集中?我最近注意到,在某些时候,我的世界从地图集转移到单个纹理 - 尽管它们各自为16x16。
使用单个模型并通过修改纹理坐标(我我已经在上面的方法2和3中做了,您可以轻松地向着色器发送偏移量以选择地图集中的特定地图:
v_fragmentTexCoord0 = u_texOffset + a_vertexTexCoord0 * u_texScale;
这提供了很大的灵活性并减少了纹理绑定的数量。这基本上就是我现在在游戏中的表现。但是 IS IT 能够更快地访问较大纹理的一小部分,并在顶点着色器中具有上述数学运算?或者反复绑定较小的纹理是否更快?特别是如果你没有按纹理排序对象。
我知道这很多。但这里的主要问题是考虑速度+便利性的最有效方法是什么?对于多个纹理,方法4会更快还是多个重新绑定会更快?还是有其他方式我忽略。我看到所有这些3D游戏都有很多图形和区域覆盖。他们如何保持帧率,特别是在iphone4等旧设备上?
**** UPDATE ****
由于我在过去几天突然有2个答案,我会说这个。基本上我确实找到了答案。或 AN 回答。问题是哪种方法更有效?意味着哪种方法将产生最佳帧速率。我已经尝试了上面的各种方法,在iPhone 5上,它们的速度一样快。 iPhone5 / 5S拥有极快的GPU。重要的是iPhone4 / 4S等旧设备,或视网膜iPad等大型设备。我的测试不科学,我没有速度报告。但4 texture2D()
对4个RGBA纹理的调用实际上与4 texture2d()
一样快或甚至更快调用具有偏移的单个纹理。当然,我在顶点着色器中进行偏移计算而不是片段着色器(从不在片段着色器中)。
所以也许有一天我会做测试,制作一个带有一些数字的网格来报告。但我现在没时间做这件事并自己写一个正确的答案。我无法真正勾选任何其他没有回答问题的答案,因为这不是SO的工作方式。
但感谢有回答的人。并查看我的另一个问题,该问题也回答了这个问题:
在您的内容管道中有一个后期处理步骤,您将rgb与alpha纹理合并并将其存储在a。打包游戏时的Ktx文件或编译时作为后期构建事件的文件。
这是相当简单的格式,编写加载2的命令行工具很简单png并将它们合并为一个Ktx,rgb + alpha。
这样做的一些好处是
- 在游戏启动时加载文件时减少了cpu开销,所以比赛开始得更快。
- 有些GPU本身不支持rgb 24bit格式,这会强制驱动程序在内部将其转换为rgba 32bit。这为加载阶段和临时内存使用增加了更多时间。
现在,当您获得纹理对象中的数据时,您确实希望最小化纹理采样,因为它意味着很多gpu操作和内存访问取决于过滤模式。
我建议有2个纹理,每个2层,因为如果你把它们全部添加到同一个问题就会出现问题当您使用双线性或mipmap进行采样时,一个是潜在的伪像,因为它可能包括靠近边缘的邻近像素,其中一个纹理图层结束,第二个纹理图层开始,或者如果您决定生成mipmaps。
作为一项额外的改进,我建议不要在Ktx中使用原始的rgba 32位数据,而是实际将其压缩为dxt或pvrtc格式。这将使用更少的内存,这意味着更快的加载时间和更少的内存传输,因为内存带宽有限。
当然,将压缩器添加到后期处理工具稍微复杂一些。
请注意,压缩纹理的质量会有所下降,具体取决于算法和实现。
I'm trying to find the most efficient way of handling multi-texturing in OpenGL ES2 on iOS. By 'efficient' I mean the fastest rendering even on older iOS devices (iPhone 4 and up) - but also balancing convenience.
I've considered (and tried) several different methods. But have run into a couple of problems and questions.
Method 1 - My base and normal values are rgb with NO ALPHA. For these objects I don't need transparency. My emission and specular information are each only one channel. To reduce texture2D()
calls I figured I could store the emission as the alpha channel of the base, and the specular as the alpha of the normal. With each being in their own file it would look like this:
My problem so far has been finding a file format that will support a full non-premultiplied alpha channel. PNG just hasn't worked for me. Every way that I've tried to save this as a PNG premultiplies the .alpha with the .rgb on file save (via photoshop) basically destroying the .rgb. Any pixel with a 0.0 alpha has a black rgb when I reload the file. I posted that question here with no activity.
I know this method would yield faster renders if I could work out a way to save and load this independent 4th channel. But so far I haven't been able to and had to move on.
Method 2 - When that didn't work I moved on to a single 4-way texture where each quadrant has a different map. This doesn't reduce texture2D()
calls but it reduces the number of textures that are being accessed within the shader.
The 4-way texture does require that I modify the texture coordinates within the shader. For model flexibility I leave the texcoords as is in the model's structure and modify them in the shader like so:
v_fragmentTexCoord0 = a_vertexTexCoord0 * 0.5;
v_fragmentTexCoord1 = v_fragmentTexCoord0 + vec2(0.0, 0.5); // illumination frag is up half
v_fragmentTexCoord2 = v_fragmentTexCoord0 + vec2(0.5, 0.5); // shininess frag is up and over
v_fragmentTexCoord3 = v_fragmentTexCoord0 + vec2(0.5, 0.0); // normal frag is over half
To avoid dynamic texture lookups (Thanks Brad Larson) I moved these offsets to the vertex shader and keep them out of the fragment shader.
But my question here is: Does reducing the number of texture samplers used in a shader matter? Or would I be better off using 4 different smaller textures here?
The one problem I did have with this was bleed over between the different maps. A texcoord of 1.0 was was averaging in some of the blue normal pixels due to linear texture mapping. This added a blue edge on the object near the seam. To avoid it I had to change my UV mapping to not get too close to the edge. And that's a pain to do with very many objects.
Method 3 would be to combine methods 1 and 2. and have the base.rgb + emission.a on one side and normal.rgb + specular.a on the other. But again I still have this problem getting an independent alpha to save in a file.
Maybe I could save them as two files but combine them during loading before sending it over to openGL. I'll have to try that.
Method 4 Finally, In a 3d world if I have 20 different panel textures for walls, should these be individual files or all packed in a single texture atlas? I recently noticed that at some point minecraft moved from an atlas to individual textures - albeit they are 16x16 each.
With a single model and by modifying the texture coordinates (which I'm already doing in method 2 and 3 above), you can easily send an offset to the shader to select a particular map in an atlas:
v_fragmentTexCoord0 = u_texOffset + a_vertexTexCoord0 * u_texScale;
This offers a lot of flexibility and reduces the number of texture bindings. It's basically how I'm doing it in my game now. But IS IT faster to access a small portion of a larger texture and have the above math in the vertex shader? Or is it faster to repeatedly bind smaller textures over and over? Especially if you're not sorting objects by texture.
I know this is a lot. But the main question here is what's the most efficient method considering speed + convenience? Will method 4 be faster for multiple textures or would multiple rebinds be faster? Or is there some other way that I'm overlooking. I see all these 3d games with a lot of graphics and area coverage. How do they keep frame rates up, especially on older devices like the iphone4?
**** UPDATE ****
Since I've suddenly had 2 answers in the last few days I'll say this. Basically I did find the answer. Or AN answer. The question is which method is more efficient? Meaning which method will result in the best frame rates. I've tried the various methods above and on the iPhone 5 they're all just about as fast. The iPhone5/5S has an extremely fast gpu. Where it matters is on older devices like the iPhone4/4S, or on larger devices like a retina iPad. My tests were not scientific and I don't have ms speeds to report. But 4 texture2D()
calls to 4 RGBA textures was actually just as fast or maybe even faster than 4 texture2d()
calls to a single texture with offsets. And of course I do those offset calculations in the vertex shader and not the fragment shader (never in the fragment shader).
So maybe someday I'll do the tests and make a grid with some numbers to report. But I don't have time to do that right now and write a proper answer myself. And I can't really checkmark any other answer that isn't answering the question cause that's not how SO works.
But thanks to the people who have answered. And check out this other question of mine that also answered some of this one: Load an RGBA image from two jpegs on iOS - OpenGL ES 2.0
Have a post process step in your content pipeline where you merge your rgb with alpha texture and store it in a. Ktx file when you package the game or as a post build event when you compile.
It's fairly trivial format and would be simple to write such command-line tool that loads 2 png's and merges these into one Ktx, rgb + alpha.
Some benefits by doing that is- less cpu overhead when loading the file at game start up, so the games starts quicker.- Some GPUso does not natively support rgb 24bit format, which would force the driver to internally convert it to rgba 32bit. This adds more time to the loading stage and temporary memory usage.
Now when you got the data in a texture object, you do want to minimize texture sampling as it means alot of gpu operations and memory accesses depending on filtering mode.
I would recommend to have 2 textures with 2 layers each since there's issues if you do add all of them to the same one is potential artifacts when you sample with bilinear or mipmapped as it may include neighbour pixels close to edge where one texture layer ends and the second begins, or if you decided to have mipmaps generated.
As an extra improvement I would recommend not having raw rgba 32bit data in the Ktx, but actually compressing it into a dxt or pvrtc format. This would use much less memory which means faster loading times and less memory transfers for the gpu, as memory bandwidth is limited.Of course, adding the compressor to the post process tool is slightly more complex.Do note that compressed textures do loose a bit of the quality depending on algorithm and implementation.
这篇关于最有效的多纹理方式 - iOS,OpenGL ES2,优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!