本文介绍了将ISO-8859-1/Latin-1转换为字符串(UTF-8)的选项有哪些?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我扫描了Rust文档,找到了在字符编码之间进行转换的某种方法,但是没有找到任何东西.我错过了什么吗?

I scanned the Rust documentation for some way to convert between character encodings but did not find anything. Did I miss something?

Rust语言及其标准库是否(直接或间接)支持它,甚至计划在不久的将来?

Is it supported (directly or indirectly) by the Rust language and its standard libraries or even planned to be in the near future?

答案之一表明存在一个简单的解决方案,因为u8可以强制转换为(Unicode)char s. Unicode是ISO-8859-1中代码点的超集,即1:1映射,它编码为UTF-8中的多个字节,这是Rust中String的内部编码.

As one of the answers suggested that there is an easy solution because u8 can be cast to (Unicode) chars. With Unicode being a superset of the codepoints in ISO-8859-1, thats a 1:1 mapping which encodes to multiple bytes in UTF-8 which is the internal encoding of Strings in Rust.

fn main() {
    println!("{}", 196u8 as char);
    println!("{}", (196u8 as char) as u8);
    println!("{}", 'Ä' as u8);
    println!("{:?}", 'Ä'.to_string().as_bytes());
    println!("{:?}", "Ä".as_bytes());
    println!("{}",'Ä' == 196u8 as char);
}

给予:

Ä
196
196
[195, 132]
[195, 132]
true

我什至没有考虑工作!

推荐答案

Rust中的字符串是unicode(UTF-8),而unicode代码点是iso-8859-1字符的超集.这种特定的转换实际上是微不足道的.

Strings in Rust are unicode (UTF-8), and unicode codepoints are a superset of iso-8859-1 characters. This specific conversion is actually trivial.

fn latin1_to_string(s: &[u8]) -> String {
    s.iter().map(|&c| c as char).collect()
}

我们将每个字节解释为一个unicode代码点,然后从这些代码点构建一个String.

We interpret each byte as a unicode codepoint and then build a String from these codepoints.

这篇关于将ISO-8859-1/Latin-1转换为字符串(UTF-8)的选项有哪些?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 19:17