RE错误：在Mac OS X非法字节序列

本文介绍了RE错误：在Mac OS X非法字节序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！我想更换一个Makefile字符串在Mac OS X的编译跨到iOS。该字符串嵌入了双引号。该命令是：I'm trying to replace a string in a Makefile on Mac OS X for cross-compiling to iOS. The string has embedded double quotes. The command is:sed -i "" 's|"iphoneos-cross","llvm-gcc:-O3|"iphoneos-cross","clang:-Os|g' Configure和错误是：sed: RE error: illegal byte sequence我试图逃避双引号，逗号，破折号，并没有快乐冒号。例如：I've tried escaping the double quotes, commas, dashes, and colons with no joy. For example:sed -i "" 's|\"iphoneos-cross\"\,\"llvm-gcc\:\-O3|\"iphoneos-cross\"\,\"clang\:\-Os|g' Configure有谁知道如何获得 SED 打印非法的字节序列的位置？还是没有人知道非法字节序列是什么？Does anyone know how to get sed to print the position of the illegal byte sequence? Or does anyone know what the illegal byte sequence is?推荐答案使用以前接受的答案是一个选项，如果你不介意失去了自己的真实语言环境支持（如果你是一个美国的系统上，你永远需要处理外国字符，这可能是罚款。）Using the formerly accepted answer is an option if you don't mind losing support for your true locale (if you're on a US system and you never need to deal with foreign characters, that may be fine.)然而，在相同的效果就可以了的临时的一个的单个命令的唯一However, the same effect can be had ad-hoc for a single command only:LC_ALL=C sed -i "" 's|"iphoneos-cross","llvm-gcc:-O3|"iphoneos-cross","clang:-Os|g' Configure请注意：重要的是一个的有效的 LC_CTYPE C 的设置，因此 LC_CTYPE = C SED ... code会的正常的也行，但如果 LC_ALL 恰好是设置（除 C 其他东西），它将覆盖个人 LC _ * -category变量，如 LC_CTYPE 。因此，最稳健的方法是设置 LC_ALL 。Note: What matters is an effective LC_CTYPE setting of C, so LC_CTYPE=C sed ... would normally also work, but if LC_ALL happens to be set (to something other than C), it will override individual LC_*-category variables such as LC_CTYPE. Thus, the most robust approach is to set LC_ALL.不过，（有效）设置 LC_CTYPE 到 C 把字符串好像每个字节是自己的字符（没有的执行基于编码规则间pretation），是不考虑作为 - 多字节点播 - UTF -8编码的OS X采用默认情况下，其中的外文字符有无多字节编码However, (effectively) setting LC_CTYPE to C treats strings as if each byte were its own character (no interpretation based on encoding rules is performed), with no regard for the - multibyte-on-demand - UTF-8 encoding that OS X employs by default, where foreign characters have multibyte encodings.在一言以蔽之：设置 LC_CTYPE 到 C 会导致外壳和实用程序只承认基本的英文字母作为字母（那些在7位ASCII范围），这样的海外字符。将不被视为字母，导致，例如，大写/小写转换失败。In a nutshell: setting LC_CTYPE to C causes the shell and utilities to only recognize basic English letters as letters (the ones in the 7-bit ASCII range), so that foreign chars. will not be treated as letters, causing, for instance, upper-/lowercase conversions to fail.另外，如果不用这可能是罚款的匹配的多字节恩codeD字符，如电子，和只是想到的通过传递这样的字符的Again, this may be fine if you needn't match multibyte-encoded characters such as é, and simply want to pass such characters through.如果这还不够和/或你想为了解引起需求原来的错误（包括确定哪些输入字节导致问题的原因）和进行编码转换阅读之下。If this is insufficient and/or you want to understand the cause of the original error (including determining what input bytes caused the problem) and perform encoding conversions on demand, read on below.问题是输入文件的编码不匹配shell的。结果更具体地说，输入文件包含的方式，是不是UTF-8的有效字符连接codeD （如@KlasLindbäck在评论中指出） - 这是在 sed的错误消息试图通过说无效的字节序列。The problem is that the input file's encoding does not match the shell's.More specifically, the input file contains characters encoded in a way that is not valid in UTF-8 (as @Klas Lindbäck stated in a comment) - that's what the sed error message is trying to say by invalid byte sequence.最有可能的，你的输入文件使用的单字节8位编码，如 ISO-8859-1 ，常用于恩code西欧语言。Most likely, your input file uses a single-byte 8-bit encoding such as ISO-8859-1, frequently used to encode "Western European" languages. 示例：重音信 A 有统一code $ C $连接点取0xE0 （224） - 同在 ISO-8859-1 。但是，由于的 UTF-8 的编码，这个单一$ C $口岸系统重新presented为的 2 的字节为单位的性质 - 0xC3 0XA0 ，而试图通过的字节的取0xE0 是无效的下UTF-8。The accented letter à has Unicode codepoint 0xE0 (224) - the same as in ISO-8859-1. However, due to the nature of UTF-8 encoding, this single codepoint is represented as 2 bytes - 0xC3 0xA0, whereas trying to pass the single byte 0xE0 is invalid under UTF-8.这里的问题使用字符串瞧连接codeD为 ISO-8859-1的示范，用 A 重新presented为的有一个的字节（通过ANSI-C-引用bash的字符串（ $'...'使用） \\ X {E0} 以创建字节）：Here's a demonstration of the problem using the string voilà encoded as ISO-8859-1, with the à represented as one byte (via an ANSI-C-quoted bash string ($'...') that uses \x{e0} to create the byte):注意， SED 命令实际上是简单地通过输入一个空操作，但我们需要它来招惹错误：Note that the sed command is effectively a no-op that simply passes the input through, but we need it to provoke the error: # -> 'illegal byte sequence': byte 0xE0 is not a valid char.sed 's/.*/&/' <<<$'voil\x{e0}'要简单的的忽略的问题，上面的 LCTYPE = C 方法可用于：To simply ignore the problem, the above LCTYPE=C approach can be used: # No error, bytes are passed through ('á' will render as '?', though).LC_CTYPE=C sed 's/.*/&/' <<<$'voil\x{e0}'如果你想确定输入的部分会导致此问题，请尝试以下操作：If you want to determine which parts of the input cause the problem, try the following: # Convert bytes in the 8-bit range (high bit set) to hex. representation. # -> 'voil\x{e0}'iconv -f ASCII --byte-subst='\x{%02x}' <<<$'voil\x{e0}'的输出会告诉你有十六进制形式高位组（即超过7位ASCII范围字节）的所有字节。（请注意，但是，这也包括正确连接codeD UTF-8多字节序列 - 将需要更复杂的方法来具体确定无效，在UTF-8字节）The output will show you all bytes that have the high bit set (bytes that exceed the 7-bit ASCII range) in hexadecimal form. (Note, however, that that also includes correctly encoded UTF-8 multibyte sequences - a more sophisticated approach would be needed to specifically identify invalid-in-UTF-8 bytes.) 按需执行编码转换标准工具的iconv 可以用来转换成（ -t ）和/或（ -f ）编码; 的iconv -l 列出了所有支持的人。Standard utility iconv can be used to convert to (-t) and/or from (-f) encodings; iconv -l lists all supported ones. 例子：转换FROM ISO-8859-1 来的编码实际上在外壳（根据 LC_CTYPE ，其中在 UTF-8 默认为基础的），建立在上面的例子：Convert FROM ISO-8859-1 to the encoding in effect in the shell (based on LC_CTYPE, which is UTF-8-based by default), building on the above example: # Converts to UTF-8; output renders correctly as 'voilà'sed 's/.*/&/' <<<"$(iconv -f ISO-8859-1 <<<$'voil\x{e0}')"请注意，这的转换可以让你搭配得当外文字符的： # Correctly matches 'à' and replaces it with 'ü': -> 'voilü'sed 's/à/ü/' <<<"$(iconv -f ISO-8859-1 <<<$'voil\x{e0}')"要返回转换输入 ISO-8859-1 处理后，只需管结果到另一个的iconv 命令：To convert the input BACK to ISO-8859-1 after processing, simply pipe the result to another iconv command:sed 's/à/ü/' <<<"$(iconv -f ISO-8859-1 <<<$'voil\x{e0}')" | iconv -t ISO-8859-1 这篇关于RE错误：在Mac OS X非法字节序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！

ILLEGAL

RE错误：在Mac OS X非法字节序列