问题描述
我正在尝试使用Apache Commons FileUtils.lineIterator
逐行迭代1.2GB文件.但是,只要LineIterator
调用hasNext()
,我就会得到一个java.lang.OutOfMemoryError: Java heap space
.我已经将1G
分配给了Java堆.
I'm trying to iterate line-by-line a 1.2GB file using Apache Commons FileUtils.lineIterator
. However, as soon as a LineIterator
calls hasNext()
I get a java.lang.OutOfMemoryError: Java heap space
. I've already allocated 1G
to the java heap.
我在这里做错了什么?阅读了一些文档后,LineIterator是否不应该从文件系统读取文件并且不将其加载到内存中?
What am I doing wrong in here? After reading some docs, isn't LineIterator supposed to be reading the file from the file system and not loading it into memory?
请注意,代码在Scala中:
Note the code is in Scala:
val file = new java.io.File("data_export.dat")
val it = org.apache.commons.io.FileUtils.lineIterator(file, "UTF-8")
var successCount = 0L
var totalCount = 0L
try {
while ( {
it.hasNext()
}) {
try {
val legacy = parse[LegacyEvent](it.nextLine())
BehaviorEvent(legacy)
successCount += 1L
} catch {
case e: Exception => println("Parse error")
}
totalCount += 1
}
} finally {
it.close()
}
感谢您的帮助!
推荐答案
代码看起来不错.可能它找不到文件中的行尾,并且将一个大于1Gb的非常长的行读入内存.
The code looks good. Probably it does not find an end of a line in the file and reads a very long line which is larger than 1Gb into memory.
在Unix中尝试wc -l
,看看会得到多少行.
Try wc -l
in Unix and see how many lines you get.
这篇关于使用Apache Commons lineIterator时出现OutOfMemory错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!