问题描述
我有一些我想分析的大型json文件,我想避免将所有数据一次加载到内存中。我想要一个函数/循环,可以每次返回一个字符。
我发现用于迭代字符串中的单词,功能看起来像它一次可以返回一个字符。我也有bufio中的 ReadRune
函数,它们大部分都可以工作,但这感觉像是一个相当沉重的做法。
编辑
我比较了3种方法。全部使用循环从bufio.Reader或bufio.Scanner中提取内容。
- 使用<$ c读取循环中的符文$ c> .ReadRune 放在
bufio.Reader
上。检查调用错误到.ReadRune
。 - 从
bufio.Scanner
在扫描仪上调用.Split(bufio.ScanRunes)
之后。被称为.Scan
和,并检查.Scan
是否有错误。 - 相同作为#2,但使用。我用
strings.Join([]字符串)加入了一段字符串,而不是用
string([] runes)
加入一段符文。
每个人的10次运行时间为23 MB json文件是:
-
0.65 s
-
2.40 s
-
0.97 s
因此,看起来像
ReadRune
毕竟不算太坏。因为每个符文都在1次操作( .ReadRune
)中取代2( .Scan c>和
.Bytes
)。 通过循环中的一个...
I have some large json files I want to parse, and I want to avoid loading all of the data into memory at once. I'd like a function/loop that can return me each character one at a time.
I found this example for iterating over words in a string, and the ScanRunes function in the bufio package looks like it could return a character at a time. I also had the ReadRune
function from bufio mostly working, but that felt like a pretty heavy approach.
EDIT
I compared 3 approaches. All used a loop to pull content from either a bufio.Reader or a bufio.Scanner.
- Read runes in a loop using
.ReadRune
on a bufio.Reader
. Checked for errors from the call to .ReadRune
. - Read bytes from a
bufio.Scanner
after calling .Split(bufio.ScanRunes)
on the scanner. Called .Scan
and .Bytes
on each iteration, checking .Scan
call for errors. - Same as #2 but read text from a
bufio.Scanner
instead of bytes using .Text
. Instead of joining a slice of runes with string([]runes)
, I joined an slice of strings with strings.Join([]strings, "")
to form the final blobs of text.
The timing for 10 runs of each on a 23 MB json file was:
0.65 s
2.40 s
0.97 s
So it looks like ReadRune
is not too bad after all. It also results in smaller less verbose call because each rune is fetched in 1 operation (.ReadRune
) instead of 2 (.Scan
and .Bytes
).
解决方案 Just read each rune one by one in the loop... See example
这篇关于如何在Go中按字符读取文件字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!