本文介绍了什么8位编码使用C1范围的字符? (x80-x9F或128-159)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 维基百科在拉丁语1补充下列出了x80-x9FC1范围>为Unicode。此范围也保留在 ISO-8859-1 代码页中。 我正在查找一个字符串文件,所有这些都在7位ASCII范围内,除了少数 \x96 我不知道是否有其他字符在C1范围内可能最终显示在数据中,因此我想知道是否有正确的方式来读取文件。有没有任何8位编码使用x80到x9F字符数据而不是终端控制字符?解决方案大量(可能无限数量)的8位编码,为0x80到0x9F范围内的一些或所有字节分配图形字符。 Microsoft定义的几种编码在字节位置0x96处有U + 2013 EN DASH - ,这个字符可能会出现在街道地址中,特别是在数字之间。 另一方面,例如例如,MacRoman在位置0x96处有字母ñ,例如,它可能出现在西班牙语的街道名称中。 为了对情况进行合理分析,应该检查数据作为一个整体,可能使用一个过滤器,找到Ascii范围0x00到0x7F之外的所有字节,查看字符出现的上下文,并尝试找到有关数据的起源的技术信息。 > Wikipedia has a listing of the x80—x9F "C1" range under Latin 1 Supplement for Unicode. This range is also reserved in the ISO-8859-1 codepage.I'm looking at a file of strings, all of which are within the 7-bit ASCII range except for a few instances of \x96 where it looks like a dash would be, such as the middle of a street address.I don't know if other characters in the C1 range might eventually show up in the data, so I'd like to know if there's a correct way to read the file. Are there are any 8-bit encodings which use x80 through x9F for character data instead of terminal control characters? 解决方案 There is a large number (potentially an infinite number) of 8-bit encodings that assign graphic characters to some or all bytes in the range 0x80 to 0x9F. Several encodings defined by Microsoft have U+2013 EN DASH "–" at byte position 0x96, and this character could conceivably appear in a street address, especially between numbers.On the other hand, e.g. MacRoman has the letter "ñ" at position 0x96, and it could well appear within a street name in Spanish, for example.For a rational analysis of the situation, you should inspect the data as a whole, possibly using a filter that finds all bytes outside the Ascii range 0x00 to 0x7F, look at the contexts in which the characters appear, and try to find technical information about the origin of the data. 这篇关于什么8位编码使用C1范围的字符? (x80-x9F或128-159)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云! 09-05 18:34