本文介绍了未知字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从文件中读取了编码为 UTF-8的字符串。我需要将其与表达式匹配。
文件的第一个字符是,但是在字符串中,第一个字符是''(空符号)。我已经将其转换为带有字符集 UTF-8的字节,这里是 [-17,-69,-65] 。有人知道它是什么,以及如何使用正则表达式解决吗?

I read the string from file with encoding "UTF-8". And I need to match it to a expression.The first character of the file is #, but in the string the first is ''(empty symbol). I have translated it into bytes with charset "UTF-8", here it is [-17, -69, -65]. Does anyone know what is it and how to solve it with regexprs?

推荐答案

某些编辑器(如记事本)会添加BOM(字节顺序)保存UTF-8文本时的掩码)签名。从此类文件中读取字符串之前,应检查0xEF,0xBB,0xBF字节,如果存在则跳过它们。

Some editors (like notepad) adds BOM (byte order mask) signature when saved UTF-8 text. You should check 0xEF, 0xBB, 0xBF bytes before read string from such file and skip them if they exists.

另一种方法是不使用记事本编辑UTF-8文本,获取其他程序(如Notepad ++,Kate或其他可以控制添加BOM的程序)。

Another way is do not use notepad for editing UTF-8 texts, get other program like Notepad++, Kate or whatever with witch you can control adding BOM.

这篇关于未知字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 21:10