本文介绍了在std :: u8string和std :: string之间转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

C ++ 20为UTF-8添加了char8_tstd::u8string.但是,没有std::cout的UTF-8版本,并且OS API通常期望使用char和执行字符集.因此,我们仍然需要一种在UTF-8和执行字符集之间进行转换的方法.

C++20 added char8_t and std::u8string for UTF-8. However, there is no UTF-8 version of std::cout and OS APIs mostly expect char and execution character set. So we still need a way to convert between UTF-8 and execution character set.

我正在重新阅读 char8_t纸,看来在UTF-8和ECS之间转换的唯一方法是使用std::c8rtombstd::mbrtoc8函数.但是,它们的API极为混乱.有人可以提供示例代码吗?

I was rereading a char8_t paper and it looks like the only way to convert between UTF-8 and ECS is to use std::c8rtomb and std::mbrtoc8 functions. However, their API is extremely confusing. Can someone provide an example code?

推荐答案

在C ++ 20中,UTF-8支持"似乎是一个坏笑话.

UTF-8 "support" in C++20 seems to be a bad joke.

STL中唯一的UTF功能是支持字符串和string_views(std :: u8string,std :: u8string_view,std :: u16string等).就这些.对于正则表达式,格式,文件I/O等中的UTF编码,没有STL支持.

The only UTF functionality in the STL is support for strings and string_views (std::u8string, std::u8string_view, std::u16string, ...). That is all. There is no STL support for UTF coding in regular expressions, formatting, file i/o and so on.

在C ++ 17中,您可以-至少-轻松地将任何UTF-8数据视为'char'数据,从而可以使用std :: regex,std :: fstream,std :: cout等.不会损失性能.

In C++17 you can--at least--easily treat any UTF-8 data as 'char' data, which makes usage of std::regex, std::fstream, std::cout, etc. possible without loss of performance.

在C ++ 20中,情况将会改变.例如,您不能再写std::string text = u8"...";不可能写类似

In C++20 things will change. You cannot longer write for example std::string text = u8"..."; It will be impossible to write something like

std::u8fstream file; std::u8string line; ... file << line;

因为没有std :: u8fstream.

since there is no std::u8fstream.

即使新的C ++ 20 std :: format根本不支持UTF,因为所有必需的重载都被丢失了.你不会写

Even the new C++20 std::format does not support UTF at all, because all necessary overloads are simply missing. You cannot write

std::u8string text = std::format(u8"...{}...", 42);

更糟糕的是,在std :: string和std :: u8string之间(甚至在const char *和const char8_t *之间)没有简单的转换(或转换).因此,如果要格式化(使用std :: format)或输入/输出(std :: cin,std :: cout,std :: fstream等),您必须在内部复制所有字符串. -这将是不必要的性能杀手.

To make matters worse, there is no simple casting (or conversion) between std::string and std::u8string (or even between const char* and const char8_t*). So if you want to format (using std::format) or input/output (std::cin, std::cout, std::fstream, ...) UTF-8 data, you have to internally copy all strings. - That will be an unnecessary performance killer.

最后,如果没有输入,输出和格式设置,UTF将有什么用途?

Finally, what use will UTF have without input, output, and formatting?

这篇关于在std :: u8string和std :: string之间转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-27 11:11
查看更多