问题描述
我需要在UTF-8,UTF-16和UTF-32之间转换不同的API /模块,因为我知道有选择使用C ++ 11上午看新的字符串类型。
看起来像我可以使用 string
, u16string
和 u32string
用于UTF-8,UTF-16和UTF-32。我还发现 codecvt_utf8
和 codecvt_utf16
看起来能够在 char
或 char16_t
和 char32_t
,看起来像一个更高级别 wstring_convert
,但只显示与字节/ std :: string
,而不是大量的文档。
我想用UTF-16↔UTF-32和UTF-8↔UTF-32的情况下使用 wstring_convert
我只是真的发现了UTF-8到UTF-16的例子,我甚至不确定将在Linux上正确的 wchar_t
通常被认为是UTF-32 ...或做更复杂的事情与那些codecvt的事情直接?
或者这只是还没有真正在一个可用的状态,我应该坚持我自己现有的小程序使用8,16和32位无符号整数?
如果您阅读CppReference.com上的,以便于在各种UTF之间进行转换。尽管它的名字,它不仅限于 std :: wstring
,它实际上与任何 std :: basic_string
类型( std :: string
, std :: wstring
和 std :: uXXstring
例如:
typedef std :: string u8string;
u8string To_UTF8(const std :: u16string& s)
{
std :: wstring_convert< std :: codecvt_utf8_utf16< char16_t>,char16_t>转换
return conv.to_bytes(s);
}
u8string To_UTF8(const std :: u32string& s)
{
std :: wstring_convert< std :: codecvt_utf8< char32_t>,char32_t>转换
return conv.to_bytes(s);
}
std :: u16string To_UTF16(const u8string& s)
{
std :: wstring_convert< std :: codecvt_utf8_utf16< char16_t>,char16_t>转换
return conv.from_bytes(s);
}
std :: u16string To_UTF16(const std :: u32string& s)
{
std :: wstring_convert< std :: codecvt_utf16< char32_t& char32_t>转换
std :: string bytes = conv.to_bytes(s);
return std :: u16string(reinterpret_cast< const char16_t *>(bytes.c_str()),bytes.length()/ sizeof(char16_t));
}
std :: u32string To_UTF32(const u8string& s)
{
std :: wstring_convert< codecvt_utf8< char32_t>,char32_t>转换
return conv.from_bytes(s);
}
std :: u32string To_UTF32(const std :: u16string& s)
{
const char16_t * pData = s.c_str
std :: wstring_convert< std :: codecvt_utf16< char32_t>,char32_t>转换
return conv.from_bytes(reinterpret_cast< const char *>(pData),reinterpret_cast< const char *>(pData + s.length()));
}
I need to convert between UTF-8, UTF-16 and UTF-32 for different API's/modules and since I know have the option to use C++11 am looking at the new string types.
It looks like I can use
string
, u16string
and u32string
for UTF-8, UTF-16 and UTF-32. I also found codecvt_utf8
and codecvt_utf16
which look to be able to do a conversion between char
or char16_t
and char32_t
and what looks like a higher level wstring_convert
but that only appears to work with bytes/std::string
and not a great deal of documentation.
Am I meant to use a
wstring_convert
somehow for the UTF-16 ↔ UTF-32 and UTF-8 ↔ UTF-32 case? I only really found examples for UTF-8 to UTF-16, which I am not even sure will be correct on Linux where wchar_t
is normally considered UTF-32... Or do something more complex with those codecvt things directly?
Or is this just still not really in a usable state and I should stick with my own existing small routines using 8, 16 and 32bit unsigned integers?
解决方案
If you read the documentation at CppReference.com for
wstring_convert
, codecvt_utf8
, codecvt_utf16
, and codecvt_utf8_utf16
, the pages include a table that tells you exactly what you can use for the various UTF conversions.
And yes, you would use
std::wstring_convert
to facilitate the conversion between the various UTFs. Despite its name, it is not limited to just std::wstring
, it actually operates with any std::basic_string
type (which std::string
, std::wstring
, and std::uXXstring
are all based on).
For example:
typedef std::string u8string;
u8string To_UTF8(const std::u16string &s)
{
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
return conv.to_bytes(s);
}
u8string To_UTF8(const std::u32string &s)
{
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv;
return conv.to_bytes(s);
}
std::u16string To_UTF16(const u8string &s)
{
std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> conv;
return conv.from_bytes(s);
}
std::u16string To_UTF16(const std::u32string &s)
{
std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
std::string bytes = conv.to_bytes(s);
return std::u16string(reinterpret_cast<const char16_t*>(bytes.c_str()), bytes.length()/sizeof(char16_t));
}
std::u32string To_UTF32(const u8string &s)
{
std::wstring_convert<codecvt_utf8<char32_t>, char32_t> conv;
return conv.from_bytes(s);
}
std::u32string To_UTF32(const std::u16string &s)
{
const char16_t *pData = s.c_str();
std::wstring_convert<std::codecvt_utf16<char32_t>, char32_t> conv;
return conv.from_bytes(reinterpret_cast<const char*>(pData), reinterpret_cast<const char*>(pData+s.length()));
}
这篇关于std :: u32string转换为/从std :: string和std :: u16string的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!