文件编码如何影响C ++ 11字符串文字？

本文介绍了文件编码如何影响C ++ 11字符串文字？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您可以在C ++ 11中使用 u8 / u 前缀字符串文字来编写UTF-8/16/32字符串文字。 code> / U 。编译器必须如何解释这些新类型的字符串文字中具有非ASCII字符的UTF-8文件？我知道标准没有指定文件编码，这个事实单独会使解释源代码中的非ASCII字符完全未定义的行为，使该功能只是一点点不太有用。

 
 
 我知道你仍然可以使用 \uNNNN 来转义单个Unicode字符，但是对于一个完整的俄语或法语句子，通常包含多个unicode字符。

我从各种来源了解到， u code> L 在当前Windows实现上， U Linux实现。所以考虑到这一点，我也想知道什么所需的行为是旧的字符串文字修饰符...

对于代码样本monkeys：

  string utf8string a = u8L'hôtelde ville doitêtrelà-bas。Çac'est un fait！ 
 string utf16string b = uL'hôtelde ville doitêtrelà-bas。Çac'est un fait！; 
 string utf32string c = UL'hôtelde ville doitêtrelà-bas。Çac'est un fait！;

在理想情况下，所有这些字符串都会产生相同的内容（如：转换后的字符），但我的C ++经验告诉我，这是最明确的实现定义，可能只有第一个将做我想要的。

解决方案

在GCC中，使用 -finput-charset = charset ：

还要查看选项 fexec-charset 和 -fwide-exec-charset 。

 
 
 文字：
  char a [] =Hello; 
 wchar_t b [] = LHello; 
 char16_t c [] = uHello; 
 char32_t d [] = UHello; 
  
字符串文字的大小修饰符（ L ， u ， U ）只决定文字的类型 > 
You can write UTF-8/16/32 string literals in C++11 by prefixing the string literal with u8/u/U respectively. How must the compiler interpret a UTF-8 file that has non-ASCII characters inside of these new types of string literals? I understand the standard does not specify file encodings, and that fact alone would make the interpretation of non-ASCII characters inside source code completely undefined behavior, making the feature just a tad less useful.
I understand you can still escape single unicode characters with \uNNNN, but that is not very readable for, say, a full Russian, or French sentence, which typically contain more than one unicode character.
What I understand from various sources is that u should become equivalent to L on current Windows implementations and U on e.g. Linux implementations. So with that in mind, I'm also wondering what the required behavior is for the old string literal modifiers...
For the code-sample monkeys:
string utf8string a = u8"L'hôtel de ville doit être là-bas. Ça c'est un fait!";
string utf16string b = u"L'hôtel de ville doit être là-bas. Ça c'est un fait!";
string utf32string c = U"L'hôtel de ville doit être là-bas. Ça c'est un fait!";
In an ideal world, all of these strings produce the same content (as in: characters after conversion), but my experience with C++ has taught me that this is most definitely implementation defined and probably only the first will do what I want. 
 解决方案 
In GCC, use -finput-charset=charset:
Also check out the options -fexec-charset and -fwide-exec-charset.
Finally, about string literals:
char     a[] = "Hello";
wchar_t  b[] = L"Hello";
char16_t c[] = u"Hello";
char32_t d[] = U"Hello";
The size modifier of the string literal (L, u, U) merely determines the type of the literal.
                        这篇关于文件编码如何影响C ++ 11字符串文字？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！