问题描述
我正在寻找处理一堆扫描的响应明信片,这些明信片上有手写联系信息(即姓名,地址,电话,电子邮件等)。
I'm looking to process a bunch of scanned response postcards that have handwritten contact information on them (ie Name, Address, Phone, Email, etc).
我很好奇是否有可行的开源库或软件来完成这项工作(理想情况下是Java或R)。环顾四周,很多信息来自2009年或早期,并不是很令人鼓舞。
I'm curious if there is a viable open-source library or piece of software to do this (ideally Java or R). In looking around a lot of the information is from 2009 or early and isn't very encouraging.
语言为英语。
有什么建议吗?
编辑:我看过OCRopus页面,但最新版本是5月份2009.任何人都有这方面的经验或是否有更新的版本?
I've looked at the OCRopus page but the latest version is from May 2009. Anyone have any experience with this or is there a more recent version?
推荐答案
首先,据我所知没有本机开源Java OCR SDK。有一些Java API包含对本机接口的调用,tesjeract()或Tess4J() 。
To begin with, as far as i know there are no native opensource Java OCR SDKs. There are Java APIs which wrap calls for native interfaces, tesjeract (http://code.google.com/p/tesjeract/) or Tess4J (http://tess4j.sf.net/).
接下来,您需要指定是否查找手写或手写文本。如果您需要手写文本识别 - 我不相信您将能够解决您的任务,因为其他答案中陈述的原因。
Next, you need to specify whether you look for handwritten or handprinted text. If you need handwriting text recognition - i don't beleive you'll be able to solve your tasks because of the reasons stated in other answers.
但是,如果您需要手写文本的ICR(代表智能字符识别)(在调查,表格等中使用相当清晰的字母),可能会有一个解决方案。虽然我相信tesseract(尽管被认为是开源引擎中最好的)不会在这里为你完成这项工作,但你可以寻找更准确的SDK。
However, if you need ICR (that stands for intelligent character recognition) for handprinted text (rather clear letters used in surveys, forms, etc.) there could be a solution. While I beleive that tesseract (despite being considered the best among opensource engines) won't do the job for you here, you can look for more accurate SDKs.
也许这个问题会有所帮助:
Maybe this question would help: Handwritten scanned Doc to .txt File?
这篇关于有可行的手写识别库/程序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!