本文介绍了为什么`std :: basic_ifstream< char16_t>`在c ++ 11中不起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码按预期工作。源代码文件 file.txt和 out.txt均使用utf8编码。但是当我在 main()的第一行将 wchar_t 更改为 char16_t 时,它不起作用。 )。我已经用 -std = c ++ 11 尝试了gcc5.4和clang8.0。我的目标是将 wchar_t 替换为 char16_t ,作为 wchar_t 在RAM中占用两倍的空间。我认为这两种类型在c ++ 11和更高版本的标准中同样受支持。我在这里想念什么?

The following code works as expected. The source code, file "file.txt" and "out.txt" are all encoded with utf8. But it does not work when I change wchar_t to char16_t at the first line in main(). I've tried both gcc5.4 and clang8.0 with -std=c++11. My goal is to replace wchar_t with char16_t, as wchar_t takes twice space in RAM. I thought these 2 types are equally well supported in c++11 and later standards. What do I miss here?

#include<iostream>
#include<fstream>
#include<locale>
#include<codecvt>
#include<string>

int main(){
  typedef wchar_t my_char;

  std::locale::global(std::locale("en_US.UTF-8"));

  std::ofstream out("file.txt");
  out << "123正则表达式abc" << std::endl;
  out.close();

  std::basic_ifstream<my_char> win("file.txt");
  std::basic_string<my_char> wstr;
  win >> wstr;
  win.close();

  std::ifstream in("file.txt");
  std::string str;
  in >> str;
  in.close();

  std::wstring_convert<std::codecvt_utf8<my_char>, my_char> my_char_conv;
  std::basic_string<my_char> conv = my_char_conv.from_bytes(str);

  std::cout << (wstr == conv ? "true" : "false") << std::endl;

  std::basic_ofstream<my_char> wout("out.txt");
  wout << wstr << std::endl << conv << std::endl;
  wout.close();

  return 0;
}






编辑



修改后的代码无法使用clang8.0编译。它可以使用gcc5.4进行编译,但会在运行时崩溃,如@Brian所示。


EDIT

The modified code does not compile with clang8.0. It compiles with gcc5.4 but crashes at run-time as shown by @Brian.

推荐答案

各种流类都需要一组定义是可行的。标准库仅对 char wchar_t 需要相关的定义和对象,而对于 char16_t则不需要 char32_t 。我头顶上需要使用 std :: basic_ifstream< cT> std :: basic_ofstream< cT>

The various stream classes need a set of definitions to be operational. The standard library requires the relevant definitions and objects only for char and wchar_t but not for char16_t or char32_t. Off the top of my head the following is needed to use std::basic_ifstream<cT> or std::basic_ofstream<cT>:


  1. std :: char_traits< cT> 来指定字符类型的行为。我认为此模板专门用于 char16_t char32_t

  2. 使用的 std :: locale 需要包含 std :: num_put< cT> 构面以格式化数字类型。可以实例化此构面,并且可以创建包含该构面的新 std :: locale ,但是该标准并未强制要求它包含在中std :: locale 对象。

  3. 使用过的 std :: locale 需要包含一个实例。方面 std :: num_get< cT> 来读取数字类型。再次,该构面可以被实例化,但默认情况下不需要存在。

  4. 构面 std :: numpunct< cT> 需要专门化,并放入使用的 std :: locale 中以处理小数点,千位分隔符和文本布尔值。即使未真正使用它,也会从数字格式和解析功能中引用它。 char16_t char32_t 没有现成的专业化。

  5. 构面 std :: ctype< cT> 需要专门化,并放入使用过的构面中以支持字符类型的扩展,缩小和分类。 char16_t char32_t 没有专门的专业知识。

  1. std::char_traits<cT> to specify how the character type behaves. I think this template is specialized for char16_t and char32_t.
  2. The used std::locale needs to contain an instance of the std::num_put<cT> facet to format numeric types. This facet can just be instantiated and a new std::locale containing it can be created but the standard doesn't mandate that it is present in a std::locale object.
  3. The used std::locale needs to contain an instance of the facet std::num_get<cT> to read numeric types. Again, this facet can be instantiated but isn't required to be present by default.
  4. the facet std::numpunct<cT> needs to be specialized and put into the used std::locale to deal with decimal points, thousand separators, and textual boolean values. Even if it isn't really used it will be referenced from the numeric formatting and parsing functions. There is no ready specialization for char16_t or char32_t.
  5. The facet std::ctype<cT> needs to be specialized and put into the used facet to support widening, narrowing, and classification of the character type. There is no ready specialization for char16_t or char32_t.

  1. 方面 std :: codecvt< cT,char,std :: mbstate_t> 需要专门化,并放入使用的 std :: locale 中,以在外部字节序列和内部字符序列之间进行转换。 char16_t char32_t 没有现成的专业化。

  1. The facet std::codecvt<cT, char, std::mbstate_t> needs to be specialized and put into the used std::locale to convert between external byte sequences and internal "character" sequences. There is no ready specialization for char16_t or char32_t.


大多数方面都相当容易实现:它们只需要转发简单的转换或进行表查找。但是, std :: codecvt 方面往往比较棘手,特别是因为 std :: mbstate_t 是不透明的类型从标准C ++库的角度来看。

Most of the facets are reasonably easy to do: they just need to forward a simple conversion or do table look-ups. However, the std::codecvt facet tends to be rather tricky, especially because std::mbstate_t is an opaque type from the point of view of the standard C++ library.

所有这些都可以做到。自从我上次为字符类型实现概念验证以来已经有一段时间了。我花了大约一天的时间。当然,当我着手完成以前实现语言环境和IOStreams库的工作时,我知道该做什么。添加合理数量的测试而不是仅仅进行简单的演示可能会花费我一周左右的时间(假设我实际上可以专注于这项工作)。

All of that can be done. It is a while since I last did a proof of concept implementation for a character type. It took me about a day worth of work. Of course, I knew what I need to do when I embarked on the work having implemented the locales and IOStreams library before. To add a reasonable amount of tests rather than merely having a simple demo would probably take me a week or so (assuming I can actually concentrate on this work).

这篇关于为什么`std :: basic_ifstream&lt; char16_t&gt;`在c ++ 11中不起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-22 21:58