问题描述
我正在从远程源读取数据,偶尔得到另一个编码中的某些字符。它们不重要。我想得到一个最好的猜测utf-8字符串,并忽略无效数据。
主要目标是获取可以使用的字符串,而不会遇到错误,例如:
- 编码:: UndefinedConversionError:从ASCII-8BIT到UTF-8的\xFF:
- utf-8中的无效字节序列
我以为是这样的:
string.encode(UTF-8,:invalid =>:replace,:undef =>:replace,:replace =>?)
将用'?'替换所有已知的。
要忽略所有未知数,:replace => ''
:
string.encode(UTF-8,:invalid =>:replace, undef =>:replace,:replace =>)
编辑:
我不知道这是可靠的。我已经进入偏执模式,并且一直在使用:
string.encode(UTF-8,...)。 force_encoding('UTF-8')
脚本似乎正在运行,好的。但我很确定我早些时候会收到错误。
编辑2:
即使这样,我继续收到间歇性错误。不是每次都记住你有时候
I'm reading data from a remote source, and occassionally get some characters in another encoding. They're not important.
I'd like to get get a "best guess" utf-8 string, and ignore the invalid data.
Main goal is to get a string I can use, and not run into errors such as:
- Encoding::UndefinedConversionError: "\xFF" from ASCII-8BIT to UTF-8:
- invalid byte sequence in utf-8
I thought this was it:
string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "?")
will replace all knowns with '?'.
To ignore all unknowns, :replace => ''
:
string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "")
Edit:
I'm not sure this is reliable. I've gone into paranoid-mode, and have been using:
string.encode("UTF-8", ...).force_encoding('UTF-8')
Script seems to be running, ok now. But I'm pretty sure I'd gotten errors with this earlier.
Edit 2:
Even with this, I continue to get intermittant errors. Not every time, mind you. Just sometimes.
这篇关于相当于Ruby 1.9.X中的Iconv.conv(“UTF-8 // IGNORE”,...)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!