问题描述
我是Objective-C的新手,并尝试使用。
I am new to Objective-C and try to convert a malformed UTF8 encoded NSString to a wellformed one using the example on apples docs.
NSString *theString = @"Lügen"; //should be "ü"
NSString *asciiString = [[NSString alloc] initWithData:asciiData encoding:NSASCIIStringEncoding];
NSLog(@"Original: %@ (length %d)", theString, [theString length]);
NSLog(@"Converted: %@ (length %d)", asciiString, [asciiString length]);
结果:
Original: Lügen (length 6)
Converted: LA1/4gen (length 8)
这里什么都不做:
NSString* str = [NSString stringWithUTF8String:
[theString cStringUsingEncoding:NSASCIIStringEncoding]];
这会使我的应用程序崩溃
This here crashes my app
NSString* str = [NSString stringWithUTF8String:
[theString cStringUsingEncoding:NSUTF8StringEncoding]];
任何人都知道我做错了什么?
Anyone any idea what I am doing wrong?
推荐答案
NSString *string = @"ü";
const char *c = [string cStringUsingEncoding:NSISOLatin1StringEncoding];
NSString *newString = [[NSString alloc]initWithCString:c encoding:NSUTF8StringEncoding];
NSLog(@"%@",newString); // ü
格式错误的UTF-8序列是指在UTF中无效的字节序列8。在解析具有与字符串的原始作者使用的编码不同的编码的字符串之后,您的问题是意外的结果。
"Malformed UTF-8 sequence" means a sequence of bytes which are invalid in UTF-8. Your problem is unexpected results after parsing a string with a different encoding than the one used by the original author of the string.
十六进制数据 C3 BC使用UTF-8编码的
已解析是字符ü
。相反,你使用Latin-1编码,结果是Ã
。然后你从Latin-1解析的字符串创建了一个NSString,这意味着你将拉丁文-1字符串转换为UTF-16字符串(这是NSString的本机格式)。
Hexadecimal data C3 BC
parsed with UTF-8 encoding is character ü
. Instead you used Latin-1 encoding, which results in ü
. Then you created a NSString from the Latin-1 parsed string, which means you converted the Latin-1 string to a UTF-16 string (which is the native format of NSString).
以不同的编码表示给定数据会显示为不同的字符,但不会更改数据。转换为不同的编码会更改数据以尝试重现相同的字符。示例:字符Ã
是 C3 83 C2 BC
,UTF-8,但 C3 BC拉丁语-1中的
。所以我在Latin-1中转换为相同的字符以获取原始数据,然后我将其解析为UTF-8。
Representing a given data in different encodings shows up as different chars, but doesn't change the data. Converting to a different encoding does change the data in an attempt to reproduce the same characters. Example: The character ü
is C3 83 C2 BC
in UTF-8, but C3 BC
in Latin-1. So I converted to the same chars in Latin-1 to get the original data, and then I parsed as UTF-8.
这篇关于用NSString解码UTF8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!