问题描述
我想使用pytesseract阿拉伯语,并且我的系统/usr/share/tesseract/tessdata/路径中有ara.traineddata,并且我已经安装了tesseract软件包
这是我的代码:
导入pytesseract从PIL导入图像pytesseract.image_to_string(Image.open('test_arabic.png'),config ='',lang ="ara")
我得到这个错误:
TesseractError追溯(最近一次通话)
中的 ---->1 pytesseract.image_to_string(Image.open('test_persian.png'),config =,lang =" ara)〜/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py in image_to_string(image,lang,config,nice,output_type,timeout)368个参数= [image,'txt',lang,config,nice,timeout]369->370 return {371 Output.BYTES:lambda:run_and_get_output(*(args + [True])),372 Output.DICT:lambda:{'text':run_and_get_output(* args)},〜/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py在< lambda>()中371 Output.BYTES:lambda:run_and_get_output(*(args + [True])),372 Output.DICT:lambda:{'text':run_and_get_output(* args)},->第373页374} [output_type]()375〜/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py在run_and_get_output中(图像,扩展名,lang,config,nice,超时,return_bytes)280}281->第282章283 filename = kwargs ['output_filename_base'] + extsep +扩展名284以open(filename,'rb')作为output_file:〜/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py在run_tesseract中(input_filename,output_filename_base,扩展名,lang,config,nice,timeout)256,其中timeout_manager(proc,timeout)作为error_string:257如果proc.returncode:->(258)第258章259260TesseractError:(1,'read_params_file:找不到参数:')
感谢您的帮助.
我建议使用正确的语言模型和最新版本:
对于Windows 10:
I want to use pytesseract Arabic And I have ara.traineddata in my system /usr/share/tesseract/tessdata/ path and i have already installed tesseract package
This is my code:
import pytesseract
from PIL import Image
pytesseract.image_to_string(Image.open('test_arabic.png'), config='', lang="ara")
and i get this error:
TesseractError Traceback (most recent call last)
in
----> 1 pytesseract.image_to_string(Image.open('test_persian.png'), config='', lang="ara")
~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
368 args = [image, 'txt', lang, config, nice, timeout]
369
--> 370 return {
371 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
372 Output.DICT: lambda: {'text': run_and_get_output(*args)},
~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py in <lambda>()
371 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
372 Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 373 Output.STRING: lambda: run_and_get_output(*args),
374 }[output_type]()
375
~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
280 }
281
--> 282 run_tesseract(**kwargs)
283 filename = kwargs['output_filename_base'] + extsep + extension
284 with open(filename, 'rb') as output_file:
~/.local/lib/python3.8/site-packages/pytesseract/pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
256 with timeout_manager(proc, timeout) as error_string:
257 if proc.returncode:
--> 258 raise TesseractError(proc.returncode, get_errors(error_string))
259
260
TesseractError: (1, 'read_params_file: parameter not found:')
Thanks for help.
I suggest using the proper language model and the latest version:
For Windows 10:
tesseract-ocr-w64-setup-v5.0.0-alpha.20200328.exe (64 bit) resp.
To validate installation in the power shell or cmd terminal execute:
tesseract -v
It will output something like this: tesseract v5.0.0-alpha.20200328
For Mac OS:
brew install tesseract
To validate installation in the power shell or cmd terminal execute:
tesseract -v
It will output something like this: tesseract 4.1.1 and also the installed image librariesleptonica-1.80.0libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1Found AVX2Found AVXFound FMAFound SSE
If you are not sure about the path, then simply copy paste the ara.traindata file in the same folder as that of your Python .py file
import pytesseract
from PIL import Image
import os
os.environ["TESSDATA_PREFIX"] = "" # Leaving it empty because file is already copy pasted in the current directory
print(os.getenv("TESSDATA_PREFIX"))
# Copy paste the ara.traineddata file in the same directory as this python code
print(pytesseract.image_to_string(Image.open('cropped.png'), lang="ara"))
For Linux/Ubuntu OS:
sudo apt-get install tesseract-ocr
The validation and run code is same as that of Mac Os
Also make sure the path is fine.
This code works fine if the ara.traineddata file is downloaded successfully:
import pytesseract
from PIL import Image
print(pytesseract.image_to_string(Image.open('cropped.png'), lang="ara"))
You can follow this tutorial for details. Here is the demo output of this tutorial which uses Arabic language as well.
这篇关于为什么pytesseract引发阿拉伯语错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!