根据https://en.wikipedia.org/wiki/Java_class_file#General_layout的定义,类文件的Java常量池从文件的10个字节开始。
到目前为止,我已经能够解析之前的所有内容(可以神奇地检查它是否是一个类文件,主要/次要版本,常量池大小),但是我仍然不确切了解如何解析常量池。像,是否有用于指定方法引用和其他内容的操作码?
在用十六进制表示文本之前,有什么方法可以引用每个十六进制值,以找出以下值是什么?
我应该通过用NOP(0x00)分割每组条目然后解析不是文本值的每个字节来解决问题吗?
例如,如何确定这些值分别代表什么?
最佳答案
所需的与类文件唯一相关的文档是The Java® Virtual Machine Specification,尤其是Chapter 4. The class File Format,如果要解析的不仅仅是常量池Chapter 6. The Java Virtual Machine Instruction Set,那么它是唯一的文档。
常量池由长度可变的项组成,这些项的第一个字节确定其类型,进而决定大小。大多数项目由指向其他项目的一两个索引组成。不需要任何第三方库的简单解析代码如下所示:
public static final int HEAD=0xcafebabe;
// Constant pool types
public static final byte CONSTANT_Utf8 = 1;
public static final byte CONSTANT_Integer = 3;
public static final byte CONSTANT_Float = 4;
public static final byte CONSTANT_Long = 5;
public static final byte CONSTANT_Double = 6;
public static final byte CONSTANT_Class = 7;
public static final byte CONSTANT_String = 8;
public static final byte CONSTANT_FieldRef = 9;
public static final byte CONSTANT_MethodRef =10;
public static final byte CONSTANT_InterfaceMethodRef =11;
public static final byte CONSTANT_NameAndType =12;
public static final byte CONSTANT_MethodHandle =15;
public static final byte CONSTANT_MethodType =16;
public static final byte CONSTANT_InvokeDynamic =18;
public static final byte CONSTANT_Module =19;
public static final byte CONSTANT_Package =20;
static void parseRtClass(Class<?> clazz) throws IOException, URISyntaxException {
URL url = clazz.getResource(clazz.getSimpleName()+".class");
if(url==null) throw new IOException("can't access bytecode of "+clazz);
Path p = Paths.get(url.toURI());
if(!Files.exists(p))
p = p.resolve("/modules").resolve(p.getRoot().relativize(p));
parse(ByteBuffer.wrap(Files.readAllBytes(p)));
}
static void parseClassFile(Path path) throws IOException {
ByteBuffer bb;
try(FileChannel ch=FileChannel.open(path, StandardOpenOption.READ)) {
bb=ch.map(FileChannel.MapMode.READ_ONLY, 0, ch.size());
}
parse(bb);
}
static void parse(ByteBuffer buf) {
if(buf.order(ByteOrder.BIG_ENDIAN).getInt()!=HEAD) {
System.out.println("not a valid class file");
return;
}
int minor=buf.getChar(), ver=buf.getChar();
System.out.println("version "+ver+'.'+minor);
for(int ix=1, num=buf.getChar(); ix<num; ix++) {
String s; int index1=-1, index2=-1;
byte tag = buf.get();
switch(tag) {
default:
System.out.println("unknown pool item type "+buf.get(buf.position()-1));
return;
case CONSTANT_Utf8: decodeString(ix, buf); continue;
case CONSTANT_Class: case CONSTANT_String: case CONSTANT_MethodType:
case CONSTANT_Module: case CONSTANT_Package:
s="%d:\t%s ref=%d%n"; index1=buf.getChar();
break;
case CONSTANT_FieldRef: case CONSTANT_MethodRef:
case CONSTANT_InterfaceMethodRef: case CONSTANT_NameAndType:
s="%d:\t%s ref1=%d, ref2=%d%n";
index1=buf.getChar(); index2=buf.getChar();
break;
case CONSTANT_Integer: s="%d:\t%s value="+buf.getInt()+"%n"; break;
case CONSTANT_Float: s="%d:\t%s value="+buf.getFloat()+"%n"; break;
case CONSTANT_Double: s="%d:\t%s value="+buf.getDouble()+"%n"; ix++; break;
case CONSTANT_Long: s="%d:\t%s value="+buf.getLong()+"%n"; ix++; break;
case CONSTANT_MethodHandle:
s="%d:\t%s kind=%d, ref=%d%n"; index1=buf.get(); index2=buf.getChar();
break;
case CONSTANT_InvokeDynamic:
s="%d:\t%s bootstrap_method_attr_index=%d, ref=%d%n";
index1=buf.getChar(); index2=buf.getChar();
break;
}
System.out.printf(s, ix, FMT[tag], index1, index2);
}
}
private static String[] FMT= {
null, "Utf8", null, "Integer", "Float", "Long", "Double", "Class",
"String", "Field", "Method", "Interface Method", "Name and Type",
null, null, "MethodHandle", "MethodType", null, "InvokeDynamic",
"Module", "Package"
};
private static void decodeString(int poolIndex, ByteBuffer buf) {
int size=buf.getChar(), oldLimit=buf.limit();
buf.limit(buf.position()+size);
StringBuilder sb=new StringBuilder(size+(size>>1)+16)
.append(poolIndex).append(":\tUtf8 ");
while(buf.hasRemaining()) {
byte b=buf.get();
if(b>0) sb.append((char)b);
else
{
int b2 = buf.get();
if((b&0xf0)!=0xe0)
sb.append((char)((b&0x1F)<<6 | b2&0x3F));
else
{
int b3 = buf.get();
sb.append((char)((b&0x0F)<<12 | (b2&0x3F)<<6 | b3&0x3F));
}
}
}
buf.limit(oldLimit);
System.out.println(sb);
}
不要被
getChar()
调用弄糊涂了,我将它们用作获取无符号缩写的便捷方法,而不是getShort()&0xffff
。上面的代码仅打印其他池项目的引用索引。为了对项目进行解码,您可以先将所有项目的数据存储到随机访问数据结构中,即数组或
List
,因为项目可能引用索引号较高的项目。请注意索引1
的开始...