问题描述
我正在研究 OCR 系统.我在识别 ROI 内的文本时面临的一个挑战是 抖动 或 运动效果 镜头或文本由于角度位置而无法聚焦.请考虑以下演示示例
如果您注意到文本(例如标记为红色),在这种情况下,OCR 系统无法正确识别文本.但是,这种情况也可能在没有角度拍摄的情况下出现,其中图像太模糊以至于 OCR 系统无法识别或部分识别文本.有时它们模糊或有时非常低分辨率或像素化.例如
我们尝试过的方法
首先,我们尝试了 SO 上可用的各种方法.但遗憾的是没有运气.
2.神经增强
在
3.ISR
更新 2
[方法]:通过核估计和噪声注入实现的真实世界超分辨率试过这个方法.有希望.但是,在我们的案例中不起作用.代码.
[方法]:照片修复与上述所有方法相比,它在 OCR 的超文本分辨率方面的表现令人惊讶.它极大地去除了噪声、模糊等,使图像更加清晰,更好地增强了模型的泛化能力.代码.
我的查询
是否有任何有效的解决方法来处理此类情况?任何可以改善这种模糊或低分辨率像素的方法,无论文本是前面还是远处拍摄角度?
解决方案目前,有一种解决方案通过核估计和噪声注入实现的真实世界超分辨率.作者提出了一个退化框架RealSR,为超分辨率学习提供了逼真的图像.这是一种很有前途的抖动或运动效果图像超分辨率方法.
该方法分为两个阶段.第一阶段超分辨率的真实降级
是从真实数据中估计退化并真实地生成LR 图像.
第二阶段超分辨率模型
是基于构建的数据训练SR模型.
你可以看看这篇 Github 文章:https://github.com/jixiaozhong/RealSR
I am working on an OCR system. A challenge that I'm facing for recognizing the text within ROI is due to the shakiness or motion effect shot or text that is not focus due to angle positions. Please consider the following demo sample
If you notice the texts (for ex. the mark as a red), in such cases the OCR system couldn't properly recognize the text. However, this scenario can also come on with no angle shot where the image is too blurry that the OCR system can't recognize or partially recognize the text. Sometimes they are blurry or sometimes very low resolution or pixelated. For example
Methods we've tried
Firstly we've tried various methods available on SO. But sadly no luck.
- How to improve image quality to extract text from image using Tesseract
- How to improve image quality? [closed]
- Image quality improvement in Opencv
Next, we've tried the following three most promising methods as below.
1.TSRN
A recent research work (TSRN) mainly focuses on such cases. The main intuitive of it is to introduce super-resolution (SR) techniques as pre-processing. This implementation looks by far the most promising. However, it fails to do magic on our custom dataset (for example the second images above, the blue text). Here are some example from their demonstration:
2. Neural Enhance
After looking at its illustration on its page, we believed it might work. But sadly it also couldn't address the problem. However, I was a bit confusing even with their showed example because I couldn't reproduce them too. I've raised an issue on github where I demonstrated this more in detail. Here are some example from their demonstration:
3. ISR
The last choice with minimum hope with this implementation. No luck either.
Update 1
[Method]: Apart from the above, we also tried some traditional approaches such as Out-of-focus Deblur Filter (Wiener filter and also unsupervised Weiner filter). We also checked the Richardson-Lucy method. but no improvement with this approach either.
[Method]: We’ve checked out a GAN based DeBlur solution. DeblurGAN I have tried this network. What attracted me was the approach of the Blind Motion Deblurring mechanism.
Lastly, from this discussion we encounter this research work which seems really good enough. Didn't try this yet.
Update 2
[Method]: Real-World Super-Resolution via Kernel Estimation and Noise InjectionTried this method. Promising. However, didn't work in our case. Code.
[Method]: Photo RestorationComparative to the above all methods, it performs the best surprisingly in super text resolution for OCR. It greatly removes noise, blurriness, etc., and makes the image much clearer and which enhance model generalization better. Code.
My Query
Is there any effective workaround to tackle such cases? Any methods that could improve such blurry or low-resolution pixels whether the texts are in front or far away due to the camera angle?
解决方案Currently, there is one solution Real-World Super-Resolution via Kernel Estimation and Noise Injection. The author proposes a degradation framework RealSR, which provides realistic images for super-resolution learning. It is a promising method for shakiness or motion effect images super-resolution.
The method is divided into two stages. The first stage Realistic Degradation for Super-Resolution
The second stage Super-Resolution Model
You can look at this Github article: https://github.com/jixiaozhong/RealSR
这篇关于用于 OCR 的场景文本图像超分辨率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!