从带有ObjectInputStream
的readUnshared
中读取大量对象时,我遇到了OOM。 MAT指向其内部句柄表是罪魁祸首,OOM堆栈跟踪也是如此(在本文结尾)。从所有人的角度来看,这都不应该发生。此外,是否发生OOM似乎取决于先前写入对象的方式。
根据this write-up on the topic的说法,readUnshared
应该通过在读取过程中不创建句柄表条目来解决该问题(而不是readObject
)(该写法是我以前没有发现的writeUnshared
和readUnshared
的发现方式)。
但是,从我自己的观察看来,readObject
和readUnshared
的行为相同,并且OOM是否发生取决于对象是否用 reset()
after each write编写(正如我以前认为的,是否使用writeObject
和writeUnshared
无关紧要-当我第一次运行测试时,我只是感到疲倦。那是:
writeObject writeObject+reset writeUnshared writeUnshared+reset readObject OOM OK OOM OK readUnshared OOM OK OOM OK
So whether or not readUnshared
has any effect actually seems to be completely dependent on how the object was written. This is surprising and unexpected to me. I did spend some time tracing through the readUnshared
code path but, and granted it was late and I was tired, it wasn't apparent to me why it would still be using handle space and why it would depend on how the object was written (however, I now have an initial suspect although I have yet to confirm, described below).
From all of my research on the topic so far, it appears writeObject
with readUnshared
should work.
Here is the program I've been testing with:
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.EOFException;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
public class OOMTest {
// This is the object we'll be reading and writing.
static class TestObject implements Serializable {
private static final long serialVersionUID = 1L;
}
static enum WriteMode {
NORMAL, // writeObject
RESET, // writeObject + reset each time
UNSHARED, // writeUnshared
UNSHARED_RESET // writeUnshared + reset each time
}
// Write a bunch of objects.
static void testWrite (WriteMode mode, String filename, int count) throws IOException {
ObjectOutputStream out = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(filename)));
out.reset();
for (int n = 0; n < count; ++ n) {
if (mode == WriteMode.NORMAL || mode == WriteMode.RESET)
out.writeObject(new TestObject());
if (mode == WriteMode.UNSHARED || mode == WriteMode.UNSHARED_RESET)
out.writeUnshared(new TestObject());
if (mode == WriteMode.RESET || mode == WriteMode.UNSHARED_RESET)
out.reset();
if (n % 1000 == 0)
System.out.println(mode.toString() + ": " + n + " of " + count);
}
out.close();
}
static enum ReadMode {
NORMAL, // readObject
UNSHARED // readUnshared
}
// Read all the objects.
@SuppressWarnings("unused")
static void testRead (ReadMode mode, String filename) throws Exception {
ObjectInputStream in = new ObjectInputStream(new BufferedInputStream(new FileInputStream(filename)));
int count = 0;
while (true) {
try {
TestObject o;
if (mode == ReadMode.NORMAL)
o = (TestObject)in.readObject();
if (mode == ReadMode.UNSHARED)
o = (TestObject)in.readUnshared();
//
if ((++ count) % 1000 == 0)
System.out.println(mode + " (read): " + count);
} catch (EOFException eof) {
break;
}
}
in.close();
}
// Do the test. Comment/uncomment as appropriate.
public static void main (String[] args) throws Exception {
/* Note: For writes to succeed, VM heap size must be increased.
testWrite(WriteMode.NORMAL, "test-writeObject.dat", 30_000_000);
testWrite(WriteMode.RESET, "test-writeObject-with-reset.dat", 30_000_000);
testWrite(WriteMode.UNSHARED, "test-writeUnshared.dat", 30_000_000);
testWrite(WriteMode.UNSHARED_RESET, "test-writeUnshared-with-reset.dat", 30_000_000);
*/
/* Note: For read demonstration of OOM, use default heap size. */
testRead(ReadMode.UNSHARED, "test-writeObject.dat"); // Edit this line for different tests.
}
}
重新创建该程序问题的步骤:
testWrite
的情况下运行测试程序(未调用testRead
),并且将堆大小设置为高,因此writeObject
不会导致OOM。 testRead
(未调用testWrite
)的情况下第二次运行测试程序。 需要说明的是:我不在同一JVM实例中进行读写。我的写操作与我的读操作在单独的程序中进行。乍一看,上述测试程序可能会引起误解,原因是我将写入和读取测试都塞在同一个源代码中。
不幸的是,我遇到的实际情况是我有一个文件,其中包含许多用
writeObject
编写的对象(没有reset
),这将花费相当长的时间来重新生成(以天为单位)(而且reset
也会使输出文件很大),所以我想尽可能避免这种情况。另一方面,即使堆空间增大到系统上可用的最大空间,我目前也无法使用readObject
读取文件。值得注意的是,在我的实际情况下,我不需要对象流句柄表提供的缓存。
所以我的问题是:
到目前为止,我的所有研究都表明
readUnshared
的行为与对象的编写方式之间没有任何联系。这里发生了什么? writeObject
编写的,而没有使用reset
编写的,我有什么方法可以避免读取时发生OOM? 我不完全确定
readUnshared
为什么无法在这里解决问题。我希望这很清楚。我在这里空着跑,所以可能打了奇怪的字。
来自comments的以下答案:
如果您没有在JVM的当前实例中调用
writeObject()
,则不应通过调用readUnshared()
来消耗内存。我的所有研究都显示出相同但令人困惑的地方:
readUnshared
:Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3464)
at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readUnshared(ObjectInputStream.java:460)
at OOMTest.testRead(OOMTest.java:40)
at OOMTest.main(OOMTest.java:54)
ReadMode.UNSHARED
和WriteMode.NORMAL
)。 writeObject
/ writeUnshared
和reset
的各种组合生成的。读取行为仅取决于其编写方式,并且与readObject
和readUnshared
无关。请注意,writeObject
与writeUnshared
数据文件是逐字节相同的,我无法确定这是否令人惊讶。 我一直在盯着
ObjectInputStream
代码from here。我当前的嫌疑人是this line,存在于1.7和1.8中:ObjectStreamClass desc = readClassDesc(false);
其中
boolean
参数为true
(未共享)和false
(普通)。在所有其他情况下,“unshared”标志会传播到其他调用,但是在那种情况下,它会硬编码为false
,因此即使在使用readUnshared
的情况下,当读取序列化对象的类描述时,也会将句柄添加到句柄表中。 AFAICT,这是未共享标志未传递给其他方法的唯一情况,因此,我将重点放在它上面。这与例如this line将未共享的标志传递到
readClassDesc
。 (如果有人希望深入,可以跟踪从readUnshared
到这两行的呼叫路径。)但是,我还没有确认任何重要意义,也没有理由说明
false
在那里被硬编码。这只是我正在研究的当前路径,可能没有意义。同样,
ObjectInputStream
确实有一个私有方法clear
,它清除了句柄表。我做了一个实验,每次读取后都通过反射将其称为(通过反射),但是它破坏了所有内容,所以这是不可行的。 最佳答案
但是,如果对象是使用writeObject()
而不是writeUnshared()
编写的,则readUnshared()
不会减少句柄表的使用。
那是正确的。 readUnshared()
仅减少可归因于readObject()
的句柄表使用量。如果您在使用writeObject()
而不是writeUnshared()
的同一JVM中,那么writeObject()
不会减少可归因于readUnshared()
的处理表使用量。