问题描述
我将字符串转换成可读取的问题。我正在使用
I'm having problems converting the string to something readable . I'm using
NSString *substring = [NSString stringWithUTF8String:[symbol.data cStringUsingEncoding:NSUTF8StringEncoding]];
但是我无法将\U7ab6\U51b1转换为'
but I can't convert \U7ab6\U51b1 into '
它显示为窭冱这是我不想要的,它应该显示为'。任何人都可以帮助我吗?
It shows as 窶冱 which is what I don't want, it should show as an '. Can anyone help me?
推荐答案
这是字符U + 2019 RIGHT SINGLE QUOTATION MARK。
That's character U+2019 RIGHT SINGLE QUOTATION MARK.
发生了什么事你已经以UTF-8编码提交给您的字符序列的
,以字节为单位出现:
What has happened is you've had the character sequence ’s
submitted to you, in the UTF-8 encoding, which comes out as bytes:
’ s
E2 80 99 73
该字节序列然后,错误地将其解释为在Windows代码页932(日语;或多或少Shift-JIS)中编码:
That byte sequence has then, incorrectly, been interpreted as if it were encoded in Windows code page 932 (Japanese; more or less Shift-JIS):
E2 80 99 73
窶 冱
所以在这个特殊情况下,你可以通过首先将字符编码为cp932字节,然后使用UTF-8将这些字节解码为字符来恢复的
字符串。
So in this one particular case, you could recover the ’s
string by firstly encoding the characters into cp932 bytes, and then decoding those bytes back to characters using UTF-8.
然而,这不会解决你真正的问题,这是字符串被读入公司首先是正确的。在这种情况下,您得到窭冱
,因为编码的
所产生的UTF-8字节序列也是有效的Shift-JIS字节序列。但是您可能会遇到的所有可能的UTF-8字节序列都不是这样。许多其他字符将被无法解释。
However, this will not solve your real problem, which is that the strings were read in incorrectly in the first place. You got 窶冱
in this case because the UTF-8 byte sequence resulting from encoding ’s
happened also to be a valid Shift-JIS byte sequence. But that won't be the case for all possible UTF-8 byte sequences you might get. Many other characters will be unrecoverably mangled.
您需要查找字节读取到系统中并将其解码为Shift-JIS,并将其修正为使用UTF-8
You need to find where bytes are being read into the system and decoded as Shift-JIS, and fix that to use UTF-8 instead.
这篇关于NSString unicode编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!