本文介绍了为什么文件名有UTF16转换后不同的字节 - > UTF8 - > UTF16在WINAPI?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件:

我使用

接下来,我想将其转换为UTF8和背部:

Next, I want to convert this to utf8 and back:

std::string wstringToUtf8(const std::wstring& source) {
  const int size = WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0, NULL, NULL);
  std::vector<char> buffer8(size);
  WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer8.data(), size, NULL, NULL);
}

std::wstring utf8ToWstring(const std::string& source) {
  const int size = MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0);
  std::vector<wchar_t> buffer16(size);
  MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer16.data(), size);
}

int main() {
    // Some code with ReadDirectoryChangesW and
    // ...
    // std::wstring fileName = "L"TEST Ӡ⬨☐.ipt""
    // ...

    std::string filenameUTF8 = wstringToUtf8(fileName);
    std::wstring filename2 = utf8ToWstring(filenameUTF8);
    assert(filenameUTF8 == filename2); // FAIL!
    return 0;
}

但我赶上断言。
文件名2:

不同位:[29]

为什么?

推荐答案

57216似乎下降到代理对范围内,UTF-16使用EN code非BMP code点。他们需要成对给予,或解码不会给你正确的$ C $连接点。

57216 seems to fall in to surrogate pair range, used in UTF-16 to encode non-BMP code points. They need to be given in pairs, or decoding won't give you correct codepoint.

65533是一个特殊的错误特性而德codeR给人因为其他替代缺失。

65533 is a special error character which decoder gives because other surrogate is missing.

要换个说法:你原来的字符串是无效的UTF-16字符串

To put it another way: Your original string is not valid UTF-16 string.

更多信息。

More info on Wikipedia.

这篇关于为什么文件名有UTF16转换后不同的字节 - &GT; UTF8 - &GT; UTF16在WINAPI?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!