我正在使用Tika解析大型pdf和word文档,但是我得到他跟随的错误消息。
Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).
如何增加限额?
最佳答案
假设您基本上遵循Tika example for extracting to plain text,那么您要做的就是create your BodyContentHandler with a write limit of -1禁用写限制,如javadocs中所述
然后,您的代码将类似于(inspired by the example):
BodyContentHandler handler = new BodyContentHandler(-1);
InputStream stream = ContentHandlerExample.class.getResourceAsStream("test.doc");
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
try {
parser.parse(stream, handler, metadata);
return handler.toString();
} finally {
stream.close();
}