问题描述
我的 Rust 程序旨在逐行读取一个非常大(最多几 GB)的简单文本文件.问题是,这个文件太大了,无法一次读取,也无法将所有行转换为 Vec
.
My Rust program is intented to read a very large (up to several GB), simple text file line by line. The problem is, that this file is too large to be read at once, or to transfer all lines into a Vec<String>
.
在 Rust 中处理这个问题的惯用方法是什么?
What would be an idiomatic way to handle this in Rust?
推荐答案
您想使用 缓冲阅读器,BufRead
,特别是函数 BufReader.lines()
:
use std::fs::File;
use std::io::{self, prelude::*, BufReader};
fn main() -> io::Result<()> {
let file = File::open("foo.txt")?;
let reader = BufReader::new(file);
for line in reader.lines() {
println!("{}", line?);
}
Ok(())
}
请注意,如文档中所述,您未返回换行符.
Note that you are not returned the linefeed, as said in the documentation.
如果不想为每一行分配一个字符串,这里是一个重用相同缓冲区的例子:
If you do not want to allocate a string for each line, here is an example to reuse the same buffer:
fn main() -> std::io::Result<()> {
let mut reader = my_reader::BufReader::open("Cargo.toml")?;
let mut buffer = String::new();
while let Some(line) = reader.read_line(&mut buffer) {
println!("{}", line?.trim());
}
Ok(())
}
mod my_reader {
use std::{
fs::File,
io::{self, prelude::*},
};
pub struct BufReader {
reader: io::BufReader<File>,
}
impl BufReader {
pub fn open(path: impl AsRef<std::path::Path>) -> io::Result<Self> {
let file = File::open(path)?;
let reader = io::BufReader::new(file);
Ok(Self { reader })
}
pub fn read_line<'buf>(
&mut self,
buffer: &'buf mut String,
) -> Option<io::Result<&'buf mut String>> {
buffer.clear();
self.reader
.read_line(buffer)
.map(|u| if u == 0 { None } else { Some(buffer) })
.transpose()
}
}
}
或者如果你更喜欢标准的迭代器,你可以使用这个 Rc
技巧,我无耻地采用了 来自 Reddit:
Or if you prefer a standard iterator, you can use this Rc
trick I shamelessly took from Reddit:
fn main() -> std::io::Result<()> {
for line in my_reader::BufReader::open("Cargo.toml")? {
println!("{}", line?.trim());
}
Ok(())
}
mod my_reader {
use std::{
fs::File,
io::{self, prelude::*},
rc::Rc,
};
pub struct BufReader {
reader: io::BufReader<File>,
buf: Rc<String>,
}
fn new_buf() -> Rc<String> {
Rc::new(String::with_capacity(1024)) // Tweakable capacity
}
impl BufReader {
pub fn open(path: impl AsRef<std::path::Path>) -> io::Result<Self> {
let file = File::open(path)?;
let reader = io::BufReader::new(file);
let buf = new_buf();
Ok(Self { reader, buf })
}
}
impl Iterator for BufReader {
type Item = io::Result<Rc<String>>;
fn next(&mut self) -> Option<Self::Item> {
let buf = match Rc::get_mut(&mut self.buf) {
Some(buf) => {
buf.clear();
buf
}
None => {
self.buf = new_buf();
Rc::make_mut(&mut self.buf)
}
};
self.reader
.read_line(buf)
.map(|u| if u == 0 { None } else { Some(Rc::clone(&self.buf)) })
.transpose()
}
}
}
这篇关于在 Rust 中逐行读取大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!