本文介绍了RE错误:在Mac OS X非法字节序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 我想更换一个Makefile字符串在Mac OS X的编译跨到iOS。该字符串嵌入了双引号。该命令是:I'm trying to replace a string in a Makefile on Mac OS X for cross-compiling to iOS. The string has embedded double quotes. The command is:sed -i "" 's|"iphoneos-cross","llvm-gcc:-O3|"iphoneos-cross","clang:-Os|g' Configure和错误是:sed: RE error: illegal byte sequence我试图逃避双引号,逗号,破折号,并没有快乐冒号。例如:I've tried escaping the double quotes, commas, dashes, and colons with no joy. For example:sed -i "" 's|\"iphoneos-cross\"\,\"llvm-gcc\:\-O3|\"iphoneos-cross\"\,\"clang\:\-Os|g' Configure有谁知道如何获得 SED 打印非法的字节序列的位置?还是没有人知道非法字节序列是什么?Does anyone know how to get sed to print the position of the illegal byte sequence? Or does anyone know what the illegal byte sequence is?推荐答案使用 以前接受的答案是一个选项,如果你不介意失去了自己的真实语言环境支持(如果你是一个美国的系统上,你永远需要处理外国字符,这可能是罚款。)Using the formerly accepted answer is an option if you don't mind losing support for your true locale (if you're on a US system and you never need to deal with foreign characters, that may be fine.)然而,在相同的效果就可以了的临时的一个的单个命令的唯一However, the same effect can be had ad-hoc for a single command only:LC_ALL=C sed -i "" 's|"iphoneos-cross","llvm-gcc:-O3|"iphoneos-cross","clang:-Os|g' Configure请注意:重要的是一个的有效的 LC_CTYPE C 的设置,因此 LC_CTYPE = C SED ... code会的正常的也行,但如果 LC_ALL 恰好是设置(除 C 其他东西),它将覆盖个人 LC _ * -category变量,如 LC_CTYPE 。因此,最稳健的方法是设置 LC_ALL 。Note: What matters is an effective LC_CTYPE setting of C, so LC_CTYPE=C sed ... would normally also work, but if LC_ALL happens to be set (to something other than C), it will override individual LC_*-category variables such as LC_CTYPE. Thus, the most robust approach is to set LC_ALL.不过,(有效)设置 LC_CTYPE 到 C 把字符串好像每个字节是自己的字符(没有的执行基于编码规则间pretation),是不考虑作为 - 多字节点播 - UTF -8编码的OS X采用默认情况下,其中的外文字符有无多字节编码However, (effectively) setting LC_CTYPE to C treats strings as if each byte were its own character (no interpretation based on encoding rules is performed), with no regard for the - multibyte-on-demand - UTF-8 encoding that OS X employs by default, where foreign characters have multibyte encodings.在一言以蔽之:设置 LC_CTYPE 到 C 会导致外壳和实用程序只承认基本的英文字母作为字母(那些在7位ASCII范围),这样的海外字符。将不被视为字母,导致,例如,大写/小写转换失败。In a nutshell: setting LC_CTYPE to C causes the shell and utilities to only recognize basic English letters as letters (the ones in the 7-bit ASCII range), so that foreign chars. will not be treated as letters, causing, for instance, upper-/lowercase conversions to fail.另外,如果不用这可能是罚款的匹配的多字节恩codeD字符,如电子,和只是想到的通过传递这样的字符的Again, this may be fine if you needn't match multibyte-encoded characters such as é, and simply want to pass such characters through.如果这还不够和/或你想为了解引起需求原来的错误(包括确定哪些输入字节导致问题的原因)和进行编码转换 阅读之下。If this is insufficient and/or you want to understand the cause of the original error (including determining what input bytes caused the problem) and perform encoding conversions on demand, read on below.问题是输入文件的编码不匹配shell的。结果更具体地说,输入文件包含的方式,是不是UTF-8的有效字符连接codeD (如@KlasLindbäck在评论中指出) - 这是在 sed的错误消息试图通过说无效的字节序列。The problem is that the input file's encoding does not match the shell's.More specifically, the input file contains characters encoded in a way that is not valid in UTF-8 (as @Klas Lindbäck stated in a comment) - that's what the sed error message is trying to say by invalid byte sequence.最有可能的,你的输入文件使用的单字节8位编码,如 ISO-8859-1 ,常用于恩code西欧语言。Most likely, your input file uses a single-byte 8-bit encoding such as ISO-8859-1, frequently used to encode "Western European" languages. 示例:重音信 A 有统一code $ C $连接点取0xE0 (224) - 同在 ISO-8859-1 。但是,由于的 UTF-8 的编码,这个单一$ C $口岸系统重新presented为的 2 的字节为单位的性质 - 0xC3 0XA0 ,而试图通过的字节的取0xE0 是无效的下UTF-8。The accented letter à has Unicode codepoint 0xE0 (224) - the same as in ISO-8859-1. However, due to the nature of UTF-8 encoding, this single codepoint is represented as 2 bytes - 0xC3 0xA0, whereas trying to pass the single byte 0xE0 is invalid under UTF-8.这里的问题使用字符串瞧连接codeD为 ISO-8859-1的示范,用 A 重新presented为的有一个的字节(通过ANSI-C-引用bash的字符串( $'...'使用) \\ X {E0} 以创建字节):Here's a demonstration of the problem using the string voilà encoded as ISO-8859-1, with the à represented as one byte (via an ANSI-C-quoted bash string ($'...') that uses \x{e0} to create the byte):注意, SED 命令实际上是简单地通过输入一个空操作,但我们需要它来招惹错误:Note that the sed command is effectively a no-op that simply passes the input through, but we need it to provoke the error: # -> 'illegal byte sequence': byte 0xE0 is not a valid char.sed 's/.*/&/' <<<$'voil\x{e0}'要简单的的忽略的问题,上面的 LCTYPE = C 方法可用于:To simply ignore the problem, the above LCTYPE=C approach can be used: # No error, bytes are passed through ('á' will render as '?', though).LC_CTYPE=C sed 's/.*/&/' <<<$'voil\x{e0}'如果你想确定输入的部分会导致此问题,请尝试以下操作:If you want to determine which parts of the input cause the problem, try the following: # Convert bytes in the 8-bit range (high bit set) to hex. representation. # -> 'voil\x{e0}'iconv -f ASCII --byte-subst='\x{%02x}' <<<$'voil\x{e0}'的输出会告诉你有十六进制形式高位组(即超过7位ASCII范围字节)的所有字节。 (请注意,但是,这也包括正确连接codeD UTF-8多字节序列 - 将需要更复杂的方法来具体确定无效,在UTF-8字节)The output will show you all bytes that have the high bit set (bytes that exceed the 7-bit ASCII range) in hexadecimal form. (Note, however, that that also includes correctly encoded UTF-8 multibyte sequences - a more sophisticated approach would be needed to specifically identify invalid-in-UTF-8 bytes.) 按需执行编码转换标准工具的iconv 可以用来转换成( -t )和/或( -f )编码; 的iconv -l 列出了所有支持的人。Standard utility iconv can be used to convert to (-t) and/or from (-f) encodings; iconv -l lists all supported ones. 例子:转换FROM ISO-8859-1 来的编码实际上在外壳(根据 LC_CTYPE ,其中在 UTF-8 默认为基础的),建立在上面的例子:Convert FROM ISO-8859-1 to the encoding in effect in the shell (based on LC_CTYPE, which is UTF-8-based by default), building on the above example: # Converts to UTF-8; output renders correctly as 'voilà'sed 's/.*/&/' <<<"$(iconv -f ISO-8859-1 <<<$'voil\x{e0}')"请注意,这的转换可以让你搭配得当外文字符的: # Correctly matches 'à' and replaces it with 'ü': -> 'voilü'sed 's/à/ü/' <<<"$(iconv -f ISO-8859-1 <<<$'voil\x{e0}')"要返回转换输入 ISO-8859-1 处理后,只需管结果到另一个的iconv 命令:To convert the input BACK to ISO-8859-1 after processing, simply pipe the result to another iconv command:sed 's/à/ü/' <<<"$(iconv -f ISO-8859-1 <<<$'voil\x{e0}')" | iconv -t ISO-8859-1 这篇关于RE错误:在Mac OS X非法字节序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-01 16:47