本文介绍了WebSocket的数据揭露/多字节XOR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

的WebSocket规范定义揭露数据

websocket spec defines unmasking data as

j                   = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j

,其中掩模是4字节长,揭露具有每字节被应用

where mask is 4 bytes long and unmasking has to be applied per byte.

有没有办法更有效地做到这一点,而不是仅仅循环字节?

Is there a way to do this more efficiently, than to just loop bytes?

服务器运行code可以假定为Haswell的CPU,操作系统是Linux的内核> 3.2,所以SSE等都是present。编码是用C做,但如果需要,我可以做ASM为好。

Server running the code can assumed to be a Haswell CPU, OS is Linux with kernel > 3.2, so SSE etc are all present. Coding is done in C, but I can do asm as well if necessary.

我倒是想看看了自己的解决方案,但无法弄清楚是否有任何数十SSE1-5 / AVE /的一个恰当的指令(无论扩展 - 记不清了许多多年来)

I'd tried to look up the solution myself, but was unable to figure out if there was an appropriate instruction in any of the dozens of SSE1-5/AVE/(whatever extension - lost track of the many over the years)

非常感谢你!

编辑:重读规范了几次之后,它似乎它实际上只用异或运算面具字节,我可以在一个时间,直到最后几个字节做8个字节的数据字节。问题仍然是开放的,因为我认为有可能可能会依然优化这个使用SSE或类似的方法(可能在处理一次甚至16个字节?让过程做的循环?...)

After rereading the spec a couple of times it seems that it's actually only XOR'ing the data bytes with the mask bytes, which I can do 8 bytes at a time till the last few bytes. Question is still open, as I think there could probably be still a way to optimize this using SSE or the like (maybe processing even 16 bytes at a time? letting the process do the for loop? ...)

推荐答案

是的,你可以在XOR一条指令16字节使用SSE2,或同时与AVX2 32字节(Haswell的和更高版本)。

Yes, you can XOR 16 bytes in one instruction using SSE2, or 32 bytes at a time with AVX2 (Haswell and later).

SSE2:

#include <emmintrin.h>                     // SSE2 instrinsics

__m128i v, v_mask;
uint8_t *buff;                             // buffer - must be 16 byte aligned

for (int i = 0; i < N; i += 16)            // note that N must be multiple of 16
{
    v = _mm_load_si128(&buff[i]);          // load 16 bytes
    v = _mm_xor_si128(v, v_mask);          // XOR with mask
    v = _mm_store_si128(&buff[i], v);      // store 16 masked bytes
}

AVX2:

#include <immintrin.h>                     // AVX2 intrinsics

__m256i w, w_mask;
uint8_t *buff;                             // buffer - must be 16 byte aligned,
                                           // and preferably 32 byte aligned

for (int i = 0; i < N; i += 32)            // note that N must be multiple of 32
{
    w = _mm256_load_si256(&buff[i]);       // load 32 bytes
    w = _mm256_xor_si256(w, w_mask);       // XOR with mask
    w = _mm256_store_si256(&buff[i], w);   // store 32 masked bytes
}

这篇关于WebSocket的数据揭露/多字节XOR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 08:48