问题描述
我正在尝试使用 OpenCV 和 Pytesseract 从图像中读取文本,但结果不佳.
我有兴趣阅读文字的图片是:
您提到必须分别给予 PRATOLA 和 PELIGNA - 只是被分开"":
splitted = text.split(" ")
已识别
CUCINE
润滑油
商店
普拉托拉佩利尼亚
CRE [+O 校正和外推线]
厨房
...C 39 211 47 221 0U 62 211 69 221 0C 84 211 92 221 0我 107 211 108 221 0否 123 211 131 221 0E 146 211 153 221 0L 39 108 59 166 0U 63 107 93 166 0乙 98 108 128 166 0E 133 108 152 166 0440 134 468 173 0电话 470 135 499 173 0Ø 500 134 539 174 0R 544 135 575 173 0E 580 135 608 173 0电话 287 76 315 114 0R 319 76 350 114 0A 352 76 390 114 0电话 387 76 417 114 0Ø 417 75 456 115 0大号 461 76 487 114 0489 76 526 114 0电话 543 76 572 114 0E 576 76 604 114 0L 609 76 634 114 0我 639 76 643 114 0G 649 75 683 115 0否 690 76 722 114 0A 726 76 764 114 0C 21 30 55 65 0R 62 31 93 64 0E 99 31 127 64 0K 47 19 52 25 0我 61 19 62 25 071 19 76 25 0C 84 19 89 25 0高 96 19 109 25 0E 113 19 117 25 0否 127 19 132 25 0141 19 145 22 0这些来自获得盒子".
初始消息:
我猜是cucine"出现的区域也就是说,自适应阈值可能会更好地对其进行分割,或者可能先应用一些边缘检测.
厨房看起来很小,尝试扩大那个面积/距离怎么样.
对于 CREO,我猜它与相邻字幕的大小混淆了.对于O"在 creo 中,您可以应用 dilate 以缩小O"的间隙.
我玩了一点,但没有 Tesseract,它需要更多的工作.我的目标是使字母更具对比度,可能需要仅在 Cucine 上选择性地应用其中一些处理,可能需要在两次通过中应用识别.当获得这些部分词Cu"时,在CU..."周围的顶部矩形上应用自适应阈值等(如下)和OCR
二进制阈值:
自适应阈值、中值模糊(以清除噪声)和反转:
扩张连接小间隙,但也会破坏细节.
导入 cv2将 numpy 导入为 np#pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'path_to_image = "logo.png";#path_to_image = "logo1.png";图像 = cv2.imread(path_to_image)h, w, _ = image.shapew*=3;高*=3w = (int)(w);h = (int) (h)image = cv2.resize(image, (w,h),interpolation = cv2.INTER_AREA) #resize 3次# 将图像转换为灰度图像gray_image = cv2.cvtColor(图像,cv2.COLOR_BGR2GRAY)cv2.imshow('灰色图像', gray_image)cv2.waitKey(0)# 通过阈值将其转换为二值图像# 如果你有彩色图像,这一步是必需的,因为如果你跳过这部分# 那么 tesseract 将无法正确检测文本,这将给出不正确的结果#threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]# 显示图片threshold_img = cv2.adaptiveThreshold(gray_image, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,cv2.THRESH_BINARY,13,3) #cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11,2)[1]cv2.imshow('阈值图像', threshold_img)cv2.waitKey(0)#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)threshold_img = cv2.medianBlur(threshold_img,5)cv2.imshow('medianBlur', threshold_img)cv2.waitKey(0)threshold_img = cv2.bitwise_not(threshold_img)cv2.imshow('反转', threshold_img)cv2.waitKey(0)#kernel = np.ones((1, 1), np.uint8)#threshold_img = cv2.dilate(threshold_img, 内核)#cv2.imshow('Dilate', threshold_img)#cv2.waitKey(0)cv2.imshow('阈值图像', thrfeshold_img)# 保持输出窗口直到用户按下一个键cv2.waitKey(0)# 销毁屏幕上的当前窗口cv2.destroyAllWindows()# 现在将图像提供给 tesseract文本 = pytesseract.image_to_string(threshold_img)打印(文本)
I'm trying to read text from an image, using OpenCV and Pytesseract, but with poor results.
The image I'm interested in reading the text is: https://www.lubecreostorepratolapeligna.it/gb/img/logo.png
This is the code I am using:
pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
image = cv2.imread(path_to_image)
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
cv2.imshow('threshold image', threshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)
The result of the execution is : ["cu"," ","LUBE"," ","STORE","PRATOLA PELIGNA"]
But the result should be these 7 words: ["cucine", "LUBE", "CREO", "kitchens", "STORE", "PRATOLA", "PELIGNA"]
Is there anyone who could help me to solve this problem ?
Edit, 17.12.2020: Using preprocessing now it recognizes all, but the "O" in CREO. See the stages in ocr8.py. Then ocr9.py demonstrates (but not automated yet) finding the lines of text by the coordinates returned from pytesseract.image_to_boxes(), approcimate size of the letters and inter-symbol distance, then extrapolating one step ahead and searching for a single character (--psm 8).
It happened that Tesseract had actually recognized the "O" in CREO, but it read it as ♀, probably confused by the little "k" below etc.
Since it is a rare and "strange"/unexpected symbol, it could be corrected - replaced automatically (see the function Correct()).
There is a technical detail: Tesseract returns the ANSI/ASCII symbol 12, (0x0C) while the code in my editor was in Unicode/UTF-8 - 9792. So I coded it inside as chr(12).
The latest version: ocr9.py
You mentioned that PRATOLA and PELIGNA have to be given sepearately - just split by " ":
splitted = text.split(" ")
RECOGNIZED
CUCINE
LUBE
STORE
PRATOLA PELIGNA
CRE [+O with correction and extrapolation of the line]
KITCHENS
...
C 39 211 47 221 0
U 62 211 69 221 0
C 84 211 92 221 0
I 107 211 108 221 0
N 123 211 131 221 0
E 146 211 153 221 0
L 39 108 59 166 0
U 63 107 93 166 0
B 98 108 128 166 0
E 133 108 152 166 0
S 440 134 468 173 0
T 470 135 499 173 0
O 500 134 539 174 0
R 544 135 575 173 0
E 580 135 608 173 0
P 287 76 315 114 0
R 319 76 350 114 0
A 352 76 390 114 0
T 387 76 417 114 0
O 417 75 456 115 0
L 461 76 487 114 0
A 489 76 526 114 0
P 543 76 572 114 0
E 576 76 604 114 0
L 609 76 634 114 0
I 639 76 643 114 0
G 649 75 683 115 0
N 690 76 722 114 0
A 726 76 764 114 0
C 21 30 55 65 0
R 62 31 93 64 0
E 99 31 127 64 0
K 47 19 52 25 0
I 61 19 62 25 0
T 71 19 76 25 0
C 84 19 89 25 0
H 96 19 109 25 0
E 113 19 117 25 0
N 127 19 132 25 0
S 141 19 145 22 0
These are from getting "boxes".
Initial message:
I guess that for the area where "cucine" is, an adaptive threshold may segment it better or maybe applying some edge detection first.
Kitchens seems very small, what about trying to enlarge that area/distance.
For the CREO, I guess it's confused with the big and small size of adjacent captions.For the "O" in creo, you may apply dilate in order to close the gap of the "O".
Edit: I played a bit, but without Tesseract and it needs more work. My goal was to make the letters more contrasting, may need some of these processings to be applied selectively only on the Cucine, maybe applying the recognition in two passes. When getting those partial words "Cu", apply adaptive threshold etc. (below) and OCR on a top rectangle around "CU..."
Binary Threshold:
Adaptive Threshold, Median blur (to clean noise) and invert:
Dilate connects small gaps, but it also destroys detail.
import cv2
import numpy as np
#pytesseract.pytesseract.tesseract_cmd = r'D:\Program Files\pytesseract\tesseract.exe'
path_to_image = "logo.png"
#path_to_image = "logo1.png"
image = cv2.imread(path_to_image)
h, w, _ = image.shape
w*=3; h*=3
w = (int)(w); h = (int) (h)
image = cv2.resize(image, (w,h), interpolation = cv2.INTER_AREA) #Resize 3 times
# converting image into gray scale image
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow('grey image', gray_image)
cv2.waitKey(0)
# converting it to binary image by Thresholding
# this step is require if you have colored image because if you skip this part
# then tesseract won't able to detect text correctly and this will give incorrect result
#threshold_img = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# display image
threshold_img = cv2.adaptiveThreshold(gray_image, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,13,3) #cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11,2)[1]
cv2.imshow('threshold image', threshold_img)
cv2.waitKey(0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
#threshold_img = cv2.GaussianBlur(threshold_img,(3,3),0)
threshold_img = cv2.medianBlur(threshold_img,5)
cv2.imshow('medianBlur', threshold_img)
cv2.waitKey(0)
threshold_img = cv2.bitwise_not(threshold_img)
cv2.imshow('Invert', threshold_img)
cv2.waitKey(0)
#kernel = np.ones((1, 1), np.uint8)
#threshold_img = cv2.dilate(threshold_img, kernel)
#cv2.imshow('Dilate', threshold_img)
#cv2.waitKey(0)
cv2.imshow('threshold image', thrfeshold_img)
# Maintain output window until user presses a key
cv2.waitKey(0)
# Destroying present windows on screen
cv2.destroyAllWindows()
# now feeding image to tesseract
text = pytesseract.image_to_string(threshold_img)
print(text)
这篇关于如何从pytesseract获得最佳结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!