问题描述
我正在将iOS 8中的Tesseract用于基于OCR的应用程序,但是它错误地将图像中的÷"符号转换为加号"+".
I am using Tesseract in iOS 8 for an OCR based app but it incorrectly converts the division "÷" symbol in the image to a plus "+" sign.
例如,这张图片
始终转换为文本字符串"8 + 4 + 4".应该是"8 + 4÷4".
always converts to the text string "8+4+4". It should be "8+4÷4".
我尝试使用其他训练有素的数据语言文件"eng + equ","ita",在白名单中添加÷",将ocr_engine变量设置为cube,将图像转换为灰度或黑色&白色,将图像放大2到4倍.
I've tried using different trained data language files "eng+equ", "ita", adding "÷" to the whitelist, setting the ocr_engine variable to cube, converting image to grayscale or black & white, upsizing the image by 2 and 4 times.
我尝试过的所有操作始终返回加号"+"而不是除号÷".
Everything I've tried always returns a plus "+" sign instead of a division "÷" symbol.
我尝试仅使用经过训练的"equ"数据文件,并且DOES正确返回了分隔符号-但是所有其他字符都将变成垃圾.
I tried using only the "equ" trained data file and that DOES return the division symbol correctly - but all other characters are then garbage.
我已经研究了好几天(Google,Stackoverflow),无法解决.
I've been looking into this (Google, Stackoverflow) for several days and cannot figure it out.
如何使Tesseract包含并识别÷"除法符号?
How do I get Tesseract to include and recognize the division "÷" symbol?
更新:
我能做的最好的就是将AVCaptureSession预设设置为高
The best I have been able to do is to set the AVCaptureSession preset to high
AVCaptureSession *session = [[AVCaptureSession alloc] init];
session.sessionPreset = AVCaptureSessionPresetHigh;
所捕获的尺寸大于676××405像素的图像.使用Tesseract OCR UIImage类别(图像称为源")对图像进行二值化:
The captured image above dimensions are then 676 × 405 pixels. Using Tesseract OCR UIImage category (image is named 'source') to binarize the image:
// Binarize the source image to improve contrast (using the UIImage category provided by TesseractOCR)
UIImage *blackAndWhiteImage = [source blackAndWhite];
[self.tesseract setImage:blackAndWhiteImage];
这通常会将除法符号转换为文本"-1-",但是我已经看到-:-"以及减号之间的其他数字和大写字符.
This will usually convert the division symbol to the text "-1-", but I've seen "-:-" and other numbers and uppercase characters between the minus signs.
我可以在返回的文本中进行检查.但是,那么就不可能知道是否将返回的文本"8-1-2"视为真正的减法或也许"除法.
I can check for that in the returned text. But then it is impossible to know whether to treat the returned text "8-1-2" as a true subtraction or 'maybe' division.
推荐答案
训练不同的字体或引擎字体.
Train the or engine wit different fonts.
Here is the tool for training the engine.Have a look on this also
或者您可以使用 JTessBoxEditor
这篇关于Tesseract OCR无法识别除法符号“÷".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!