问题描述
我扫描了Rust文档,找到了在字符编码之间进行转换的某种方法,但是没有找到任何东西.我错过了什么吗?
I scanned the Rust documentation for some way to convert between character encodings but did not find anything. Did I miss something?
Rust语言及其标准库是否(直接或间接)支持它,甚至计划在不久的将来?
Is it supported (directly or indirectly) by the Rust language and its standard libraries or even planned to be in the near future?
答案之一表明存在一个简单的解决方案,因为u8
可以强制转换为(Unicode)char
s. Unicode是ISO-8859-1中代码点的超集,即1:1映射,它编码为UTF-8中的多个字节,这是Rust中String
的内部编码.
As one of the answers suggested that there is an easy solution because u8
can be cast to (Unicode) char
s. With Unicode being a superset of the codepoints in ISO-8859-1, thats a 1:1 mapping which encodes to multiple bytes in UTF-8 which is the internal encoding of String
s in Rust.
fn main() {
println!("{}", 196u8 as char);
println!("{}", (196u8 as char) as u8);
println!("{}", 'Ä' as u8);
println!("{:?}", 'Ä'.to_string().as_bytes());
println!("{:?}", "Ä".as_bytes());
println!("{}",'Ä' == 196u8 as char);
}
给予:
Ä
196
196
[195, 132]
[195, 132]
true
我什至没有考虑工作!
推荐答案
Rust中的字符串是unicode(UTF-8),而unicode代码点是iso-8859-1字符的超集.这种特定的转换实际上是微不足道的.
Strings in Rust are unicode (UTF-8), and unicode codepoints are a superset of iso-8859-1 characters. This specific conversion is actually trivial.
fn latin1_to_string(s: &[u8]) -> String {
s.iter().map(|&c| c as char).collect()
}
我们将每个字节解释为一个unicode代码点,然后从这些代码点构建一个String.
We interpret each byte as a unicode codepoint and then build a String from these codepoints.
这篇关于将ISO-8859-1/Latin-1转换为字符串(UTF-8)的选项有哪些?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!