本文介绍了NSString unicode编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将字符串转换成可读取的问题。我正在使用

I'm having problems converting the string to something readable . I'm using

NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];

但是我无法将\U7ab6\U51b1转换为'

but I can't convert \U7ab6\U51b1 into '

它显示为窭冱这是我不想要的,它应该显示为'。任何人都可以帮助我吗?

It shows as 窶冱 which is what I don't want, it should show as an '. Can anyone help me?

推荐答案

这是字符U + 2019 RIGHT SINGLE QUOTATION MARK。

That's character U+2019 RIGHT SINGLE QUOTATION MARK.

发生了什么事你已经以UTF-8编码提交给您的字符序列,以字节为单位出现:

What has happened is you've had the character sequence ’s submitted to you, in the UTF-8 encoding, which comes out as bytes:

’          s
E2 80 99   73

该字节序列然后,错误地将其解释为在Windows代码页932(日语;或多或少Shift-JIS)中编码:

That byte sequence has then, incorrectly, been interpreted as if it were encoded in Windows code page 932 (Japanese; more or less Shift-JIS):

E2 80    99 73
窶        冱

所以在这个特殊情况下,你可以通过首先将字符编码为cp932字节,然后使用UTF-8将这些字节解码为字符来恢复字符串。

So in this one particular case, you could recover the ’s string by firstly encoding the characters into cp932 bytes, and then decoding those bytes back to characters using UTF-8.

然而,这不会解决你真正的问题,这是字符串被读入公司首先是正确的。在这种情况下,您得到窭冱,因为编码所产生的UTF-8字节序列也是有效的Shift-JIS字节序列。但是您可能会遇到的所有可能的UTF-8字节序列都不是这样。许多其他字符将被无法解释。

However, this will not solve your real problem, which is that the strings were read in incorrectly in the first place. You got 窶冱 in this case because the UTF-8 byte sequence resulting from encoding ’s happened also to be a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you might get. Many other characters will be unrecoverably mangled.

您需要查找字节读取到系统中并将其解码为Shift-JIS,并将其修正为使用UTF-8

You need to find where bytes are being read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.

这篇关于NSString unicode编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 22:39