UTF-8编码和解码问题 | bytes

bytes

java.lang.IndexOutOfBoundsException

使用长时间运行任务的结果重复更新 JLabel

升级到Material UI 4-withStyles后出现错误

构建Massif-Visualizer：包含找不到加载文件CMakeFindDependencyMacro

sendUserActionEvent（）为null

带有GNU Shell函数的意外标记`('附近的语法错误

Android 7本机崩溃:libc.so tgkill

如何选择邮箱

计算最小值和最大值

刷新上恢复活动的FragmentStatePagerAdapter图片

如何使用JMeter在Tomcat中模拟表单身份验证?

从C＃的Windows模拟

强制PHP json_encode()将索引编码为字符串

如何与 Paramiko 的交互式 shell 会话进行交互?

PHP 为什么 EOM 不能包含 PHP 函数?

UTF-8编码和解码问题

扫码查看

本文介绍了UTF-8编码和解码问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在将文本与UTF-8编码之间相互转换时遇到问题.这里我有字节数组，

I'm having a problem with converting text from and to UTF-8 encoding. Here I have byte array,

byte[] c = new byte[] { 1, 2, 200 };

我正在将其转换为UTF-8字符串并返回字节数组，

I'm converting it to UTF-8 string and back to byte array,

Encoding.UTF8.GetBytes(Encoding.UTF8.GetString(c));

据我了解，我应该从中得到的是一个3字节的数组.正确的?但是，这就是我要得到的.

According to my understand what i should be expecting from this is an array with 3 bytes. right? But here's what I'm getting.

byte[5] { 1, 2, 239, 191, 189 }

这是什么原因?我了解239, 191, 189组合在 UTF-8特价中称为REPLACEMENT CHARACTER

What's the reason for this?I understand the 239, 191, 189 combination is called REPLACEMENT CHARACTER in UTF-8 Specials.

这也是一个更大问题的一部分.

Also this is part of a bigger problem.

推荐答案

并非所有字节序列都是有效的UTF-8.看来您的数组(1、2、200)在UTF-8中无效(这就是为什么要添加此特殊错误字符的原因)

Not all sequences of bytes are valid UTF-8. It seems that your array (1, 2, 200) is invalid in UTF-8 (that's why this special error character is added)

MSDN谈到Encoding.UTF8:

MSDN says about Encoding.UTF8:

1)没有BOM( https://en.wikipedia.org/wiki/Byte_order_mark ).

1) There are no BOM (https://en.wikipedia.org/wiki/Byte_order_mark) in your example.

2)200-前导字节.它后面必须有足够的连续字节

2) 200 - a leading byte. It must be followed by enough continuation bytes

这篇关于UTF-8编码和解码问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

07-20 23:39