问题描述
我想从图像中执行文本识别,我想使用Python.我安装了Anaconda.现在,我想安装Tesseract,但我还需要安装Leptonica.我没有找到任何明确的说明如何在Windows中执行此操作.对于Leptonica,我不想安装Visual Studio.因此,有人可以提供明确的说明,如何在不使用Visual Studio的Anaconda中在Windows上安装leptonica和tesseract的情况下吗?谢谢.
I wanted to perform text recognition from images and I want to use Python. I installed Anaconda. Now I want to install Tesseract but I also need to install Leptonica. I did not find any clear instruction how to do it in windows. For Leptonica I do not want to install Visual Studio.So could anybody provide clear instructions how to install leptonica and tesseract on Windows without Visual Studio to use in anaconda ?Thanks.
推荐答案
以下是从2016年4月22日起使tesseract 3.05开发人员版本在Windows 7和Windows 8机器上均可运行的简单步骤:
Here is simple set of steps to have tesseract 3.05 dev version as of 04/22/2016 working both on windows 7 and windows 8 machines:
1-从tesseract-ocr官方页面的可执行文件中安装tesseract(仅适用于Windoes的3.02版)
1- install tesseract from its executable from official tesseract-ocr page (version 3.02 for windoes will suffice)
2-从 http://domasofan.spdns.eu下载tesseract 3.05开发版本的以下两个文件/tesseract/
有2个exe文件:
- tesseract-core-yyyymmdd.exe没有语言数据的Tesseract核心应用程序
- tesseract-langs-yyyymmdd.exe所有适用于Tesseract的语言数据.
- tesseract-core-yyyymmdd.exeTesseract core application without language data
- tesseract-langs-yyyymmdd.exeAll the language data available for Tesseract.
(yyyymmdd表示年4位数字,月2位数字和日2位数字.)
(yyyymmdd means year 4 digits, month 2 digits and day 2 digits.)
该应用程序是便携式的,因此您可以将其安装在USB记忆棒上或其他位置.
The app is portable so you can install it on a USB stick or in another location.
用于安装这些软件的子步骤:
sub Steps to install these:
- 下载tesseract-core和tesseract-langs软件包.
- 双击tesseract-core软件包并将其解压缩到您想要的目录(名为"Tess_temp"的临时新文件夹).
-
双击tesseract-langs软件包并将其解压缩到同一目录,但在上面的"Tess_temp"文件夹中将\ tessdata添加到其中.例如,如果我将tesseract-core提取到c:\ Tess_temp,则tesseract-langs需要转到c:\ Tess_temp \ tessdata.
- Download the tesseract-core and tesseract-langs packages.
- Double click the tesseract-core package and extract it to a directory where you want it to be (a temporary new folder called "Tess_temp").
Double click the tesseract-langs package and extract it to the same directory but add \tessdata to it in the above "Tess_temp" folder.For example if i would have extracted tesseract-core to c:\Tess_temp, tesseract-langs needs to go to c:\Tess_temp\tessdata.
现在将"Tess_temp"中的内容复制到上述步骤1中安装了tesseract 3.02的位置(通常在C:\ Program Files(x86)\ Tesseract-OCR中)(用3.05替换3.02材料)
Now copy what ever you have in "Tess_temp" to where tesseract 3.02 was installed in step 1 above (its usially in C:\Program Files (x86)\Tesseract-OCR) (replace 3.02 materials with 3.05 )
它现在应该可以在Windows上的3.05版本中使用.将样本图像test.png(带有文本)复制到此tesseract-ocr文件夹中,然后打开一个cmd并键入以下命令:
It should work now with the 3.05 version on windows.copy a sample image test.png (with text) to this tesseract-ocr folder and open a cmd and type in the following commands:
转到tesseract文件夹:cd C:\Program Files <x86>\Tesseract-OCR
go to tesseract folder: cd C:\Program Files <x86>\Tesseract-OCR
在test.png上运行tesseract:tesseract -l eng test.png test_text -psm 6
run tesseract on test.png: tesseract -l eng test.png test_text -psm 6
它将显示给您
Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
恭喜! (检查test_txt.txt中提取的文本)
congratulations ! (check test_txt.txt for the extracted text)
这篇关于如何在没有Visual Studio的Windows上安装Leptonica + tesseract以在Anaconda中使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!