本文介绍了从 Ruby 中的字符串中删除非 UTF 字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何从 ruby 字符串中删除非 UTF8 字符?我有一个字符串,其中包含例如xC2".我想从字符串中删除该字符,使其成为有效的 UTF8.
How do I delete non-UTF8 characters from a ruby string? I have a string that has for example "xC2" in it. I want to remove that char from the string so that it becomes a valid UTF8.
这个:
text.gsub!(/\xC2/, '')
返回错误:
incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
我也在看 text.unpack('U*') 和 string.pack,但没有找到任何地方.
I was looking at text.unpack('U*') and string.pack as well, but did not get anywhere.
推荐答案
您可以为此使用 encode.text.encode('UTF-8', :invalid => :replace, :undef => :replace)
You can use encode for that.text.encode('UTF-8', :invalid => :replace, :undef => :replace)
有关更多信息,请查看 Ruby-Docs
For more info look into Ruby-Docs
这篇关于从 Ruby 中的字符串中删除非 UTF 字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!