c++ - 在C++中解码以base64编码的大量数据

我有一个以base64编码的1,801,048个字符的字符串类型变量，我想对其进行解码。我的这段代码适用于较短的字符串，但是当我放入1,801,048数据时会导致错误。

这是代码段:

static inline bool is_base64(unsigned char c)
{
    return (isalnum(c) || (c == '+') || (c == '/'));
}

string base64_decode(string const& encoded_string)
{
    int in_len = encoded_string.size();
    int i = 0;
    int j = 0;
    int in_ = 0;
    unsigned char char_array_4[4], char_array_3[3];
    string ret;

    while (in_len-- && ( encoded_string[in_] != '=') && is_base64(encoded_string[in_]))
    {
        char_array_4[i++] = encoded_string[in_]; in_++;

        if (i ==4)
        {
            for (i = 0; i < 4; i++)
            {
                char_array_4[i] = base64_chars.find(char_array_4[i]);
            }

            char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
            char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
            char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

            for (i = 0; (i < 3); i++)
            {
                ret += char_array_3[i];
            }

            i = 0;
        }
    }

    if (i)
    {
        for (j = i; j < 4; j++)
        {
            char_array_4[j] = 0;
        }

        for (j = 0; j < 4; j++)
        {
            char_array_4[j] = base64_chars.find(char_array_4[j]);
        }

        char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
        char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
        char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

        for (j = 0; (j < i - 1); j++)
        {
            ret += char_array_3[j];
        }
    }

    return ret;
}

这是我的使用方式，但是它导致程序关闭或内存溢出(我不确定)。

string base64_encoded_data = "UEsDBBQAAAAIAI1Wp0xrN4dXHwIAA...." //Size = 1,801,048
string base64_decoded_data = base64_decode(base64_encoded_data);

错误在哪里或如何改进程序以确保正确地进行脱标？数据的输入和输出必须为字符串类型。

最佳答案

问题是您如何构造返回字符串ret。您一次添加一个字符，因此该字符串将定期增加容量。由于分配的块的大小以及堆的工作方式，这将留下许多未分配但仍在使用的堆空间。

由于您可以计算返回的字符串所需的大小，因此可以使用

ret.reserve((in_len * 3 + 3) / 4);

在while循环之前为整个字符串分配一个足够大的缓冲区。这将避免所有额外的内存分配，并应允许您解码大字符串。

关于c++ - 在C++中解码以base64编码的大量数据，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/50223078/