C ++中的UTF-8兼容性 | 8兼容性

8兼容性

如何在Swift中创建NS_OPTIONS样式的位掩码枚举？

C中数据类型的大小？

Verilog D-Flip-Flop 在异步复位后不重新锁存

在Java中关闭Streams

关于Rails中Postgresql的准备好的声明

Eclipse'加载描述符'需要年龄

如何在简单的PHP中发布在同一页面中?

如何确定终端是否支持颜色?

使用 XML 包将 TCX 导入 R

为什么我不能用一个点访问整数的属性?

文档格式XML中的表达式错误

如何使用 VB 6.0 生成格式良好的 XML 文件?

如何使用 XPath 忽略命名空间

为什么我不能在Mac OS X上阅读和追加std :: fstream？

为什么具有相同名称但不同签名的多继承函数不会被视为重载函数？

C ++中的UTF-8兼容性

扫码查看

本文介绍了C ++中的UTF-8兼容性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个程序，该程序必须能够处理所有语言的文本.我的理解是UTF-8可以胜任，但是我遇到了一些问题.

I am writing a program that needs to be able to work with text in all languages. My understanding is that UTF-8 will do the job, but I am experiencing a few problems with it.

我是说UTF-8可以存储在C ++中的简单char中吗?如果是这样，为什么在使用带有char，string和stringstream的程序时出现以下警告:warning C4566: character represented by universal-character-name '\uFFFD' cannot be represented in the current code page (1252). (当我使用wchar_t，wstring和wstringstream时，不会出现该错误.)

Am I right to say that UTF-8 can be stored in a simple char in C++? If so, why do I get the following warning when I use a program with char, string and stringstream: warning C4566: character represented by universal-character-name '\uFFFD' cannot be represented in the current code page (1252). (I do not get that error when I use wchar_t, wstring and wstringstream.)

此外，我知道UTF是可变长度的.当我使用at或substr字符串方法时，我会得到错误的答案吗?

Additionally, I know that UTF is variable length. When I use the at or substr string methods would I get the wrong answer?

推荐答案

要使用UTF-8字符串文字，您需要在它们前面加上u8前缀，否则您将获得实现的字符集(在您的情况下，似乎是Windows-1252):u8"\uFFFD"是以NTF表示的替换字符(U + FFFD)的空终止字节序列.它的类型为char const[4].

To use UTF-8 string literals you need to prefix them with u8, otherwise you get the implementation's character set (in your case, it seems to be Windows-1252): u8"\uFFFD" is null-terminated sequence of bytes with the UTF-8 representation of the replacement character (U+FFFD). It has type char const[4].

由于UTF-8具有可变长度，因此各种索引将以代码单位而不是代码点进行索引.由于它是可变长度的，因此不可能对UTF-8序列中的代码点进行随机访问.如果要随机访问，则需要使用固定长度的编码，例如UTF-32.为此，您可以在字符串上使用U前缀.

Since UTF-8 has variable length, all kinds of indexing will do indexing in code units, not codepoints. It is not possible to do random access on codepoints in an UTF-8 sequence because of it's variable length nature. If you want random access you need to use a fixed length encoding, like UTF-32. For that you can use the U prefix on strings.

这篇关于C ++中的UTF-8兼容性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

08-22 21:58