本文介绍了从Java使用Tesseract的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试在Java中构建一个示例应用程序,该应用程序将读取图像文件并仅输出从图像中提取的文本.我发现了 Tesseract 项目,但是该项目在c ++中似乎很有希望.为了使用它,我是否应该简单地从Java应用程序Runtime.exec(...)作为命令行运行它?还是有更好的解决方案,也许是JAR?此外,这只是一个示例应用程序,从可伸缩性角度来看,将其作为命令行应用程序运行会成为问题吗?

I'm trying to build a sample application in java that will read an image file and just output the text extracted from the image. I found the Tesseract project which seems promising, however, its in c++. In order to use it, should I simply run it as a command line from my java app Runtime.exec(...) ? Or is there a better solution, maybe a JAR? Additionally, this is just a sample app, would running it as a command line app be a concern from scalability perspective?

推荐答案

现在tesseract由javacv项目提供,与使用Tess4J相比,这是一个更好的选择,因为所需要做的只是向pom文件添加单个依赖项,然后,javacv tesseract版本会为您自动下载并链接您平台的本机libs.

Now tesseract is provided by the javacv project, this is a far better option than using Tess4J since all that is required is adding a single dependency to your pom file, the native libs for your platform will then be downloaded and linked automatically for you by the javacv tesseract version.

我在这里创建了一个示例maven项目- https://github.com/piersy/BasicTesseractExample

I've created an example maven project here - https://github.com/piersy/BasicTesseractExample

以及此处的示例gradle项目- https://github.com/piersy/BasicTesseractExampleGradle

and also an example gradle project here - https://github.com/piersy/BasicTesseractExampleGradle

要使其在我的ubuntu机器上正常工作,我需要更新libstdc ++ 6的安装

For this to work on my ubuntu machine I needed to update my install of libstdc++6

我通过运行以下命令实现了这一目的,尽管仅安装libstdc ++ 6可能对您有用.

I achieved this by running the following although just installing libstdc++6 may work for you.

sudo add-apt-repository ppa:ubuntu-toolchain-r/test 
sudo apt-get update
sudo apt-get install libstdc++6

请注意gradle项目不会执行自动安装,但仍然比使用Tess4J容易得多

Note the gradle project does not perform the automatic install but is is still a hell of a lot simpler than using Tess4J

javacv项目在此处- https://github.com/bytedeco/javacpp-presets/tree /master/tesseract

The javacv project is here - https://github.com/bytedeco/javacpp-presets/tree/master/tesseract

对Javacv家伙来说是个大道具,只希望我能早点找到它,因为它可以使我节省一周的tess4j在多个平台上的工作!

Big props to the javacv guys, only wish I'd found this earlier as it would have saved me a week of getting tess4j to work on multiple platforms!

这篇关于从Java使用Tesseract的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-16 10:21