问题描述
我知道ICR主要用于手写(手工打印)数据识别,但是我们能否利用ICR来提取失真的(质量较差)的机器打印文本呢?
I know ICR is basically used for handwritten(hand printed) data recognition but can we leverage ICR to extract distorted(bad quality) machine printed text by any chance ?
如果不是解决以下问题的最佳方法
if not what is best way to solve the following problem
我有一个非结构化文档,可能会分成两页或更多页,文档中几乎没有手写的日期字段.现在我想将其转换为文本文件.我尝试了一些具有ICR模块转换为文本文件的整页ocr(多功能网页和abbyy等)工具.它们擅长于整页OCR,但是当遇到手写日期时,会在其中放置垃圾字符,而不是在其中使用ICR模块.我不想使用基于位置的parascript和A2ia这样的表单处理工具,它们仅适用于结构化文档.
I have a unstructured document which may run into 2 or more pages, with in the document there are few date field which will be handwritten.now I want to convert this to text file.I have tried some fullpage ocr(omnipage and abbyy etc) tools which have ICR modules to convert into text file.they are good at full page OCR but when it encounter handwritten date it puts junk character instead of using ICR module there.I don't want go with form processing tools like parascript and A2ia which are position based and they work only with structured document.
或者我们可以使用ICR转换机器打印的文本和手写内容吗(在这种情况下,它将适用于手写返回日期)
or can we use ICR to convert machine printed text and handwritten(anyway it will work for hand return date in this case)
我的目的是从很少的手写文本(例如日期,数字)中获取来自非结构化文档的文本文件
here my aim is to get the text file output from unstructured document with few hand written text(like dates,numbers )
推荐答案
那是不正确的,这说明了效果不佳.如果您尝试了OmniPage和ABBYY FineReader的零售版,则这些软件包仅是OCR,而没有ICR支持.
That is incorrect, which explains the poor result. If you tried retail versions of OmniPage and ABBYY FineReader, these software packages are OCR only, without ICR support.
您可能需要某种方式,但是这种方法有一些变体.这必须是开箱即用或自行创建的两种技术的结合,但要比仅安装和运行它要花费更多的精力.
You may have to in some way, but there are a few variations of the approach. This will have to be a marriage of two technologies, either out-of-box, or self-created, but it will take more effort than just install and run it.
今天,假定没有可以提供高质量结果的非结构化文本ICR软件.全页OCR或非结构化文本OCR(机器文本)会在机器文本上产生高质量的结果,并在手写时产生垃圾.没错,ICR意味着区域识别,它可以提供数据类型和后端字典来改善手写识别.
Today, it is assumed that there is no unstructured text ICR software that can deliver high quality result. Full-page OCR or unstructured text OCR (machine text) produces high quality result on machine text, and garbage on hand-writing. You are right that ICR implies zonal recognition, which allows to provide data types and backend dictionaries for improved recognition of hand-writing.
对于最简单,最快的方法(可能也可能是最经济,劳动最少的方法),我将使用非结构化的表单处理程序包,例如ABBYY FlexiCapture( http://www.wisetrend.com/abbyy_flexicapture.shtml ).它需要一些非编程设置来定位"区域.区域可能会改变位置,并且此软件仍会找到它们,然后使用适当的算法(OCR/ICR)来读取区域内容.支持OCR,ICR,OMR(选中标记),BCR(条形码).还具有内置的整页OCR.我在公司内部使用,转售该软件,并拥有超过14年的微调经验.
For the simplest and fastest approach, which may may also be most economical and least labor intensive, I would use an unstructured form-processing package, such as ABBYY FlexiCapture (http://www.wisetrend.com/abbyy_flexicapture.shtml). It requires some non-programming setup to 'locate' zones. Zones may change position and this software still finds them, and then uses appropriate algorithm (OCR/ICR) to read zones content. Supports OCR, ICR, OMR (checkmarks), BCR (barcode). Also has built-in full page OCR. I use this software in-house, resell it, and have over 14 years of experience fine-tuning it.
对于一种可能更经济的方式,但是可能需要手动结合至少两种技术(两次购买而不是一项加人工-一天结束时可能不是最经济的一种方式),我会使用某种OCR SDK用于机器文本,以及用于手写区域的某种具有ICR功能的SDK.根据这些区域位置的一致性,您可能只能提供坐标.如果它们发生偏移,则需要对区域位置进行更深入的分析,以将其传递给ICR.需要返回ICR识别的文本,然后将其插入OCRed文本中的适当位置.
For a potentially more economical way, but one that may require manual marriage of at least two technologies (two purchases instead of one plus labor - may not be most economical at end of day), I would use some kind of OCR SDK for machine text, and some kind of ICR-capable SDK for hand-written zones. Depending on consistency in location of those zones, you may be able just to supply coordinates. If they shift, then need to do deeper analysis of zones location to pass them to ICR. ICR-recognized text will need to be returned to be inserted into appropriate places among OCRed text.
我认为,现在有了许多可以立即使用的工具,我将使用现成的东西而不是自己编写东西,因为需要解决以下几个主要挑战:区域识别,两种技术集成,工作流程.几年前,当当前工具不可用时,我们已经完成了这种集成.
In my opinion, with a number of tools that can do that out of box now, I would use something out of box instead of writing it myself because there are several major challenges that need to be solved: zone identification, two technologies integration, workflow. We have done such integration some years ago when current tools were not available.
这篇关于机器打印文字的ICR吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!