问题描述
在改变一个二进制字符串为十六进制,我只能这样做是为了根据离我找到了答案一定规模。但我想改变MASSIVE二进制字符串到他们的完整的十六进制对应在比这更有效的方式是我遇到的唯一方法做它完全:
When changing a Binary string to Hex, I could only do it to a certain size based off of the answers I found. But I want to change MASSIVE Binary strings into their complete Hex counterpart in a more efficient way than this which is the only way I've come across that does it completely:
for(size_t i = 0; i < (binarySubVec.size() - 1); i++){
string binToHex, tmp = "0000";
for (size_t j = 0; j < binaryVecStr[i].size(); j += 4){
tmp = binaryVecStr[i].substr(j, 4);
if (!tmp.compare("0000")) binToHex += "0";
else if (!tmp.compare("0001")) binToHex += "1";
else if (!tmp.compare("0010")) binToHex += "2";
else if (!tmp.compare("0011")) binToHex += "3";
else if (!tmp.compare("0100")) binToHex += "4";
else if (!tmp.compare("0101")) binToHex += "5";
else if (!tmp.compare("0110")) binToHex += "6";
else if (!tmp.compare("0111")) binToHex += "7";
else if (!tmp.compare("1000")) binToHex += "8";
else if (!tmp.compare("1001")) binToHex += "9";
else if (!tmp.compare("1010")) binToHex += "A";
else if (!tmp.compare("1011")) binToHex += "B";
else if (!tmp.compare("1100")) binToHex += "C";
else if (!tmp.compare("1101")) binToHex += "D";
else if (!tmp.compare("1110")) binToHex += "E";
else if (!tmp.compare("1111")) binToHex += "F";
else continue;
}
hexOStr << binToHex;
hexOStr << " ";
}
其彻底的,绝对的,但是进展缓慢。
Its thorough and absolute, but slow.
是否有这样做的一个简单的方法?
Is there a simpler way of doing this?
推荐答案
更新末添加比较和基准
下面是另一个取的基础上,完美的哈希值。使用生成完美的哈希的gperf
(如描述如下:<一href=\"http://stackoverflow.com/questions/16141178/is-it-possible-to-map-string-to-int-faster-than-using-hashmap/16141214?s=5|Is可以映射字符串为int比使用HashMap的快?)。
Here's another take, based on a perfect hash. The perfect hash was generated using gperf
(like described here: Is it possible to map string to int faster than using hashmap?).
我进一步通过移动功能的本地静态出的方式和标记 hexdigit()
和散列()$ C $优化C>为
constexpr
。这消除任何不必要的初始化开销使编译器全面优化空间/
I've further optimized by moving function local statics out of the way and marking hexdigit()
and hash()
as constexpr
. This removes unnecessary any initialization overhead and gives the compiler full room for optimization/
您的可能的尝试阅读例如1024半位如果可能的话一时间,并给编译器有机会向量化使用AVX / SSE指令集的操作。的(我没有检查生成的code,看是否会发生这种事。)的
You could try reading e.g. 1024 nibbles at a time if possible, and give the compiler a chance to vectorize the operations using AVX/SSE instruction sets. (I have not inspected the generated code to see whether this would happen.)
全样本code翻译流模式的std :: CIN
到的std :: COUT
是:
The full sample code to translate std::cin
to std::cout
in streaming mode is:
#include <iostream>
int main()
{
char buffer[4096];
while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount())
{
size_t got = std::cin.gcount();
char* out = buffer;
for (auto it = buffer; it < buffer+got; it += 4)
*out++ = Perfect_Hash::hexchar(it);
std::cout.write(buffer, got/4);
}
}
这里的 Perfect_Hash
类,稍有删节,并与 hexchar
查找延长。请注意,它验证输入 DEBUG
建立使用断言
:
Here's the Perfect_Hash
class, slightly redacted and extended with the hexchar
lookup. Note that it does validate input in DEBUG
builds using the assert
:
骨节病>
#include <array>
#include <algorithm>
#include <cassert>
class Perfect_Hash {
/* C++ code produced by gperf version 3.0.4 */
/* Command-line: gperf -L C++ -7 -C -E -m 100 table */
/* Computed positions: -k'1-4' */
/* maximum key range = 16, duplicates = 0 */
private:
static constexpr unsigned char asso_values[] = {
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15, 7, 3, 1, 0, 27,
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27};
template <typename It>
static constexpr unsigned int hash(It str)
{
return
asso_values[(unsigned char)str[3] + 2] + asso_values[(unsigned char)str[2] + 1] +
asso_values[(unsigned char)str[1] + 3] + asso_values[(unsigned char)str[0]];
}
static constexpr char hex_lut[] = "???????????fbead9c873625140";
public:
#ifdef DEBUG
template <typename It>
static char hexchar(It binary_nibble)
{
assert(Perfect_Hash::validate(binary_nibble)); // for DEBUG only
return hex_lut[hash(binary_nibble)]; // no validation!
}
#else
template <typename It>
static constexpr char hexchar(It binary_nibble)
{
return hex_lut[hash(binary_nibble)]; // no validation!
}
#endif
template <typename It>
static bool validate(It str)
{
static constexpr std::array<char, 4> vocab[] = {
{{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
{{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
{{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
{{'?', '?', '?', '?'}}, {{'?', '?', '?', '?'}},
{{'1', '1', '1', '1'}}, {{'1', '0', '1', '1'}},
{{'1', '1', '1', '0'}}, {{'1', '0', '1', '0'}},
{{'1', '1', '0', '1'}}, {{'1', '0', '0', '1'}},
{{'1', '1', '0', '0'}}, {{'1', '0', '0', '0'}},
{{'0', '1', '1', '1'}}, {{'0', '0', '1', '1'}},
{{'0', '1', '1', '0'}}, {{'0', '0', '1', '0'}},
{{'0', '1', '0', '1'}}, {{'0', '0', '0', '1'}},
{{'0', '1', '0', '0'}}, {{'0', '0', '0', '0'}},
};
int key = hash(str);
if (key <= 26 && key >= 0)
return std::equal(str, str+4, vocab[key].begin());
else
return false;
}
};
constexpr unsigned char Perfect_Hash::asso_values[];
constexpr char Perfect_Hash::hex_lut[];
#include <iostream>
int main()
{
char buffer[4096];
while (std::cin.read(buffer, sizeof(buffer)), std::cin.gcount())
{
size_t got = std::cin.gcount();
char* out = buffer;
for (auto it = buffer; it < buffer+got; it += 4)
*out++ = Perfect_Hash::hexchar(it);
std::cout.write(buffer, got/4);
}
}
例如用于演示输出 OD -A -t没有O /开发/ urandom的| TR-CD'01'| DD BS = 1数= 4096 | ./test
03bef5fb79c7da917e3ebffdd8c41488d2b841dac86572cf7672d22f1f727627a2c4a48b15ef27eb0854dd99756b24c678e3b50022d695cc5f5c8aefaced2a39241bfd5deedcfa0a89060598c6b056d934719eba9ccf29e430d2def5751640ff17860dcb287df8a94089ade0283ee3d76b9fefcce3f3006b8c71399119423e780cef81e9752657e97c7629a9644be1e7c96b5d0324ab16d20902b55bb142c0451e675973489ae4891ec170663823f9c1c9b2a11fcb1c39452aff76120b21421069af337d14e89e48ee802b1cecd8d0886a9a0e90dea5437198d8d0d7ef59c46f9a069a83835286a9a8292d2d7adb4e7fb0ef42ad4734467063d181745aaa6694215af7430f95e854b7cad813efbbae0d2eb099523f215cff6d9c45e3edcaf63f78a485af8f2bfc2e27d46d61561b155d619450623b7aa8ca085c6eedfcc19209066033180d8ce1715e8ec9086a7c28df6e4202ee29705802f0c2872fbf06323366cf64ecfc5ea6f15ba6467730a8856a1c9ebf8cc188e889e783c50b85824803ed7d7505152b891cb2ac2d6f4d1329e100a2e3b2bdd50809b48f0024af1b5092b35779c863cd9c6b0b8e278f5bec966dd0e5c4756064cca010130acf24071d02de39ef8ba8bd1b6e9681066be3804d36ca83e7032274e4c8e8cacf520e8078f8fa80eb8e70af40367f53e53a7d7f7afe8704c46f58339d660b8151c91bddf82b4096
我想出了三种不同的方法:
BENCHMARKS
I came up with three different approaches:
- ;现场拆装
- ; 引擎收录 rel=\"nofollow\">
- 和的基础;现场拆装
- naive.cpp (no hacks, no libraries); Live disassembly on Godbolt
- spirit.cpp (Trie); disassembly on pastebin
- and this answer: perfect.cpp hash based; Live disassembly on Godbolt
为了做一些比较,我有
- 与相同的编译器(GCC 4.9)和标志编译所有这些(
-O3 -march =本地-g0 -DNDEBUG
) - 优化的输入/输出,因此它不会被4字符/读写单个字符
- 创建一个大的输入文件(1千兆字节)
下面是结果:
- 出人意料的是,从第一个答案
幼稚
办法做的比较好 - 精神确实实在太差了这里;它网3.4MB /秒,使整个文件将需要294秒(!!!)。我们已经把它关闭的图表
- 的平均吞吐量为720MB〜/ s的 naive.cpp 并〜1.14GB / s的 perfect.cpp
- 这使得完美的哈希方法不是天真的方法快50%左右。
- Surprisingly, the
naive
approach from the first answer does rather well - Spirit does really badly here; it nets 3.4MB/s so that the whole file would take at 294 seconds (!!!). We've left it off the charts
- The average throughputs are ~720MB/s for naive.cpp and ~1.14GB/s forperfect.cpp
- This makes the perfect hash approach roughly 50% faster than the naive approach.
*摘要我说天真的方法是很多不错的突发奇想7小时前。如果你真的想要高吞吐量,完美哈希是一个很好的开端,但考虑手卷基于SIMD的解决方案
这篇关于二进制字符串为十六进制的c ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!