问题描述
C ++ 20为UTF-8添加了char8_t
和std::u8string
.但是,没有std::cout
的UTF-8版本,并且OS API通常期望使用char
和执行字符集.因此,我们仍然需要一种在UTF-8和执行字符集之间进行转换的方法.
C++20 added char8_t
and std::u8string
for UTF-8. However, there is no UTF-8 version of std::cout
and OS APIs mostly expect char
and execution character set. So we still need a way to convert between UTF-8 and execution character set.
我正在重新阅读 char8_t纸,看来在UTF-8和ECS之间转换的唯一方法是使用std::c8rtomb
和std::mbrtoc8
函数.但是,它们的API极为混乱.有人可以提供示例代码吗?
I was rereading a char8_t paper and it looks like the only way to convert between UTF-8 and ECS is to use std::c8rtomb
and std::mbrtoc8
functions. However, their API is extremely confusing. Can someone provide an example code?
推荐答案
在C ++ 20中,UTF-8支持"似乎是一个坏笑话.
UTF-8 "support" in C++20 seems to be a bad joke.
STL中唯一的UTF功能是支持字符串和string_views(std :: u8string,std :: u8string_view,std :: u16string等).就这些.对于正则表达式,格式,文件I/O等中的UTF编码,没有STL支持.
The only UTF functionality in the STL is support for strings and string_views (std::u8string, std::u8string_view, std::u16string, ...). That is all. There is no STL support for UTF coding in regular expressions, formatting, file i/o and so on.
在C ++ 17中,您可以-至少-轻松地将任何UTF-8数据视为'char'数据,从而可以使用std :: regex,std :: fstream,std :: cout等.不会损失性能.
In C++17 you can--at least--easily treat any UTF-8 data as 'char' data, which makes usage of std::regex, std::fstream, std::cout, etc. possible without loss of performance.
在C ++ 20中,情况将会改变.例如,您不能再写std::string text = u8"...";
不可能写类似
In C++20 things will change. You cannot longer write for example std::string text = u8"...";
It will be impossible to write something like
std::u8fstream file; std::u8string line; ... file << line;
因为没有std :: u8fstream.
since there is no std::u8fstream.
即使新的C ++ 20 std :: format根本不支持UTF,因为所有必需的重载都被丢失了.你不会写
Even the new C++20 std::format does not support UTF at all, because all necessary overloads are simply missing. You cannot write
std::u8string text = std::format(u8"...{}...", 42);
更糟糕的是,在std :: string和std :: u8string之间(甚至在const char *和const char8_t *之间)没有简单的转换(或转换).因此,如果要格式化(使用std :: format)或输入/输出(std :: cin,std :: cout,std :: fstream等),您必须在内部复制所有字符串. -这将是不必要的性能杀手.
To make matters worse, there is no simple casting (or conversion) between std::string and std::u8string (or even between const char* and const char8_t*). So if you want to format (using std::format) or input/output (std::cin, std::cout, std::fstream, ...) UTF-8 data, you have to internally copy all strings. - That will be an unnecessary performance killer.
最后,如果没有输入,输出和格式设置,UTF将有什么用途?
Finally, what use will UTF have without input, output, and formatting?
这篇关于在std :: u8string和std :: string之间转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!