本文介绍了如何使用tika 1.6获取文本内容文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
您好我尝试从此列表中的任何文件获取文本内容pdf,txt,doc,docx和odt
tika的实现以前工作正常但现在已经坏了,
代码是:
Hi i try get the text content from any files in this list pdf,txt,doc,docx and odtthe implementation with tika previously worked fine but now is broken,The code is it:
```
public void uploadFile(FileUploadEvent event) throws Exception {
UploadedFile file = event.getUploadedFile();
byte[] data = file.getData();
Tika tika = new Tika();
string = tika.parseToString(new ByteArrayInputStream(data));
...
}
```
任何想法? ,执行不好?
Any ideas? , bad implementation ?
推荐答案
你需要添加tika-parsers。
You need to add tika-parsers.
例如,使用maven将此依赖项添加到您的pom.xml:
For example with maven add this dependency to your pom.xml:
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.7</version>
</dependency>
你可以使用Auto-Detect Parser:
And you can use Auto-Detect Parser:
BodyContentHandler handler = new BodyContentHandler();
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
try {
parser.parse(is, handler, metadata);
text = handler.toString();
} catch(TikaException te) {
System.out.println(te.toString());
} finally {
is.close();
}
这篇关于如何使用tika 1.6获取文本内容文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!