本文介绍了文件编码如何影响C ++ 11字符串文字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您可以在C ++ 11中使用 u8 / u 前缀字符串文字来编写UTF-8/16/32字符串文字。 code> / U 。编译器必须如何解释这些新类型的字符串文字中具有非ASCII字符的UTF-8文件?我知道标准没有指定文件编码,这个事实单独会使解释源代码中的非ASCII字符完全未定义的行为,使该功能只是一点点不太有用。



我知道你仍然可以使用 \uNNNN 来转义单个Unicode字符,但是对于一个完整的俄语或法语句子,通常包含多个unicode字符。



我从各种来源了解到, u code> L 在当前Windows实现上, U Linux实现。所以考虑到这一点,我也想知道什么所需的行为是旧的字符串文字修饰符...



对于代码样本monkeys:

  string utf8string a = u8L'hôtelde ville doitêtrelà-bas。Çac'est un fait! 
string utf16string b = uL'hôtelde ville doitêtrelà-bas。Çac'est un fait!;
string utf32string c = UL'hôtelde ville doitêtrelà-bas。Çac'est un fait!;

在理想情况下,所有这些字符串都会产生相同的内容(如:转换后的字符) ,但我的C ++经验告诉我,这是最明确的实现定义,可能只有第一个将做我想要的。

解决方案

在GCC中,使用 -finput-charset = charset

还要查看选项 fexec-charset -fwide-exec-charset



文字:

  char a [] =Hello; 
wchar_t b [] = LHello;
char16_t c [] = uHello;
char32_t d [] = UHello;

字符串文字的大小修饰符( L u U )只决定文字的类型 >

You can write UTF-8/16/32 string literals in C++11 by prefixing the string literal with u8/u/U respectively. How must the compiler interpret a UTF-8 file that has non-ASCII characters inside of these new types of string literals? I understand the standard does not specify file encodings, and that fact alone would make the interpretation of non-ASCII characters inside source code completely undefined behavior, making the feature just a tad less useful.

I understand you can still escape single unicode characters with \uNNNN, but that is not very readable for, say, a full Russian, or French sentence, which typically contain more than one unicode character.

What I understand from various sources is that u should become equivalent to L on current Windows implementations and U on e.g. Linux implementations. So with that in mind, I'm also wondering what the required behavior is for the old string literal modifiers...

For the code-sample monkeys:

string utf8string a = u8"L'hôtel de ville doit être là-bas. Ça c'est un fait!";
string utf16string b = u"L'hôtel de ville doit être là-bas. Ça c'est un fait!";
string utf32string c = U"L'hôtel de ville doit être là-bas. Ça c'est un fait!";

In an ideal world, all of these strings produce the same content (as in: characters after conversion), but my experience with C++ has taught me that this is most definitely implementation defined and probably only the first will do what I want.

解决方案

In GCC, use -finput-charset=charset:

Also check out the options -fexec-charset and -fwide-exec-charset.

Finally, about string literals:

char     a[] = "Hello";
wchar_t  b[] = L"Hello";
char16_t c[] = u"Hello";
char32_t d[] = U"Hello";

The size modifier of the string literal (L, u, U) merely determines the type of the literal.

这篇关于文件编码如何影响C ++ 11字符串文字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 02:32