问题描述
我已经解析了一个文件,将字符串按行拆分,并希望在每个向量中只留下唯一的元素.我希望 vec.dedup()
像这样工作:
I've parsed a file, split the string by lines and want to leave only unique elements in each vector. I expect vec.dedup()
to work like this:
let mut vec = vec!["a", "b", "a"];
vec.dedup();
assert_eq!(vec, ["a", "b"]);
但它失败了:
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `["a", "b", "a"]`,
right: `["a", "b"]`', src/main.rs:4:4
如何删除重复项?
推荐答案
如文档所述,Vec#dedup
仅从向量中删除连续 元素(它比完全重复数据删除便宜得多).例如,如果向量是 vec!["a", "a", "b"]
,它会正常工作.
As documented, Vec#dedup
only removes consecutive elements from a vector (it is much cheaper than a full deduplication). It would work fine if the vector was vec!["a", "a", "b"]
, for example.
当然,有多种潜在的解决方案.
Of course, there are multiple potential solutions.
为了获得一个去除了所有重复项同时保留元素原始顺序的向量,itertools
crate 提供了一个 unique
适配器.
In order to obtain a vector with all duplicates removed while retaining the original order of the elements, the itertools
crate provides a unique
adaptor.
use itertools::Itertools;
let v = vec!["b", "a", "b"];
let v: Vec<_> = v.into_iter().unique().collect();
assert_eq!(v, ["b", "a"]);
如果元素顺序不重要,您可以先对元素进行排序,然后调用重复数据删除.
If element order is not important, you may sort the elements first and then call dedupe.
let mut v = vec!["a", "b", "a"];
v.sort_unstable();
v.dedup();
assert_eq!(v, ["a", "b"]);
如果快速元素查找很重要,您也可以考虑使用集合类型,例如 HashSet
.
If fast element lookup is important, you may also consider using a set type instead, such as HashSet
.
let v: HashSet<_> = ["a", "b", "a"].iter().cloned().collect();
let v2: HashSet<_> = ["b", "a"].iter().cloned().collect();
assert_eq!(v, v2);
这篇关于Vec::dedup 不起作用 — 如何对字符串向量进行重复数据删除?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!