本文介绍了为什么文件名有UTF16转换后不同的字节 - > UTF8 - > UTF16在WINAPI?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个文件:
我使用
接下来,我想将其转换为UTF8和背部:
Next, I want to convert this to utf8 and back:
std::string wstringToUtf8(const std::wstring& source) {
const int size = WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0, NULL, NULL);
std::vector<char> buffer8(size);
WideCharToMultiByte(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer8.data(), size, NULL, NULL);
}
std::wstring utf8ToWstring(const std::string& source) {
const int size = MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), NULL, 0);
std::vector<wchar_t> buffer16(size);
MultiByteToWideChar(CP_UTF8, 0, source.data(), static_cast<int>(source.size()), buffer16.data(), size);
}
int main() {
// Some code with ReadDirectoryChangesW and
// ...
// std::wstring fileName = "L"TEST Ӡ⬨☐.ipt""
// ...
std::string filenameUTF8 = wstringToUtf8(fileName);
std::wstring filename2 = utf8ToWstring(filenameUTF8);
assert(filenameUTF8 == filename2); // FAIL!
return 0;
}
但我赶上断言。
文件名2:
不同位:[29]
为什么?
推荐答案
57216似乎下降到代理对范围内,UTF-16使用EN code非BMP code点。他们需要成对给予,或解码不会给你正确的$ C $连接点。
57216 seems to fall in to surrogate pair range, used in UTF-16 to encode non-BMP code points. They need to be given in pairs, or decoding won't give you correct codepoint.
65533是一个特殊的错误特性而德codeR给人因为其他替代缺失。
65533 is a special error character which decoder gives because other surrogate is missing.
要换个说法:你原来的字符串是无效的UTF-16字符串
To put it another way: Your original string is not valid UTF-16 string.
More info on Wikipedia.
这篇关于为什么文件名有UTF16转换后不同的字节 - &GT; UTF8 - &GT; UTF16在WINAPI?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!