本文介绍了Tesseract - 错误 net.sourceforge.tess4j.Tesseract - null的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

创建了一个使用 Tesseract 的 java 应用程序,以便将给定的图像或 pdf 转换为字符串格式,在我的机器上运行它作为使用 junit 的单元测试时,它运行良好,但在运行完整系统时,这是一个 restFul API由接收图像并运行 Tesseract 的 tomcat 运行它给我以下错误:

Created a java application that uses Tesseract in order to convert a given image or pdf to a string format, when running it on my machine as a unit test using junit it runs great but when running the full system which is a restFul API run by tomcat that receives the image and runs Tesseract it gives me the following error:

23:22:36.511 [http-nio-9999-exec-3] 错误net.sourceforge.tess4j.Tesseract - nulljava.lang.NullPointerException: null atnet.sourceforge.tess4j.util.PdfUtilities.convertPdf2Png(PdfUtilities.java:107)在net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Tiff(PdfUtilities.java:48)在net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:343)在 net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:213) 在net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:197) 在ocr.OcrUtil.getString(OcrUtil.java:54) 在com.tapd.server.api.handlers.IRSHandler.uploadIRSImage(IRSHandler.java:65)在com.tapd.server.api.WebAPIService.updateParentIrsForm(WebAPIService.java:250)在 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 在sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) atsun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) atjava.lang.reflect.Method.invoke(Unknown Source) atorg.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)在org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)在org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)在org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)在org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)在org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)在org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)在org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)在org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:309)在 org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 在org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 在org.glassfish.jersey.internal.Errors.process(Errors.java:315) 在org.glassfish.jersey.internal.Errors.process(Errors.java:297) 在org.glassfish.jersey.internal.Errors.process(Errors.java:267) 在org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)在org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:292)在org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1139)在org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:460)在org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:386)在org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:334)在org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221)在org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230)在org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)在org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)在org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192)在org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)在org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)在org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:108)在org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:522)在org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)在org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)在org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)在org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)在org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:349)在org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:1110)在org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)在org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:785)在org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1425)在org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)在 java.util.concurrent.ThreadPoolExecutor.runWorker(来源不明)在 java.util.concurrent.ThreadPoolExecutor$Worker.run(未知来源)在org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)在 java.lang.Thread.run(Unknown Source) [2016-09-14 23:22:36,512][错误] java.lang.NullPointerException

我的猜测是 tessdata 文件夹不在正确的位置,当打包成 Jar 并由 tomcat 运行时,它放错了位置,但我无法弄清楚它应该位于何处,我已经仔细检查过确保所有的 Jars 都正确部署.

My guess is that the tessdata folder is not located in the right place and when packaged into a Jar and run by tomcat it is misplaced, but I couldn't figure out where it should be located and I have double checked to see that all Jars are deployed correctly.

编辑:所以看起来 Tesseract 在 AWS S3 等远程服务器上时无法处理路径,所以问题是为什么?以及如何允许它使用来自 S3 的路径?(是的,文件是公开的)

Edit: so it appears that Tesseract can't handle the path when it is on a remote server such as AWS S3, so the question is why? and how can I allow it to use a path from S3? (yes the file is public)

推荐答案

正如@Piotr R 提到的错误是 ghostscriptException.getCause() is null ,原因是发送到 Tesseract 的文件对象中配置的路径是不是有效的,现在 Tesseract 的有效定义与您的有点不同,他认为只有本地地址有效,因此在设置位于 AWS S3 上的文件时,即使它是公开的,也会引发错误.解决方案是将其保存在本地并在 Tesseract 完成后将其删除.

As @Piotr R mentioned the error was ghostscriptException.getCause() is null and the reason for that is that the path configured in the file object sent to Tesseract was not a valid one, now the definition of valid for Tesseract is a bit different then yours, he consider only a local address as valid, so when setting a file located on AWS S3 even if it's public it will throw an error.The solution was saving it locally and deleting it after Tesseract is done.

这篇关于Tesseract - 错误 net.sourceforge.tess4j.Tesseract - null的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-23 11:05