问题描述
我正在尝试为OCR准备图像,到目前为止,我使用的信息来自
考虑到这是我原来的img:
不幸的是,运行Tesseract它会产生一些问题,我认为你在字母上看到的灰色等级混淆了tesseract - 所以,你在想,是的,让我们进行二进制转换,好好只是错过了页面的后半部分,所以我尝试应用Otsu阈值,但文字变成像素化,字符失去形状。
我尝试了来自
灰度噪声是由JPEG压缩造成的(PNG太大)。正如您所看到的,结果或多或少可以接受。
如果这还不够,您可以将图像划分为多个部分。计算每个段的直方图(它应该是双峰的)然后找到2个最大值之间的颜色,这是你的阈值。问题是背景覆盖的区域要多得多,因此墨峰值相对较小,有时很难以线性比例看到完整的图像直方图:
当你为每个片段执行此操作时,它会好得多(因为阈值周围的背景/文本颜色会出现少得多)所以差距将更加明显。另外不要忘记忽略小间隙(直方图中缺少垂直线),因为它们只与量化/编码/舍入有关(并非所有灰度都存在于图像中)所以你应该过滤掉小于几个强度的间隙它们具有平均最后和下一个有效直方图条目。
I'm trying to prepare images for OCR, and so far here is what I've done using info from Extracting text OpenCV
From the resulting image I use the contours that have been filtered to make a mask as follow:
//this is the mask of all the text
Mat maskF = Mat::zeros(rgb.rows, rgb.cols, CV_8UC1);
// CV_FILLED fills the connected components found - CV_FILLED to fill
drawContours(maskF, letters, -1, Scalar(255), CV_FILLED);
cv::imwrite("noise2-Mask.png", maskF);
the resulting img is promising:
considering this was my original img:
Unfortunately running Tesseract on it yields some issues, I think the levels of gray you see between letters on words confuses tesseract - so, you're thinking yeah, lets do a binary transform, well that just misses the second half of the page, so I tried applying Otsu threshold as well but the text becomes to pixelated and characters lose their shape.
I tried CalcBlockMeanVariance from OpenCV Adaptive Threshold OCR but could not get it to compile (and I'm not certain I understand it all tbh) compile chokes on
res=1.0-res;
res=Img+res;
Anyhow, if anyone has any suggestions I'll appreciate it! Note that the fractions are rarely recognized by Tesseract but I'm writing a new training set that will hopefully improve the reco rate)
Enhancing dynamic range and normalizing illumination
The point is to normalize background to seamless color first. There are many methods to do this. Here is what I have tried for your image:
create paper/ink cell table for the image (in the same manner as in the linked answer). So you select grid cell size big enough to distinct character features from background. For your image I choose 8x8 pixels. So divide the image into squares and compute the avg color and abs difference of color for each of them. Then mark saturated ones (small abs difference) and set them as paper or ink cells according to avg color in comparison to whole image avg color.
Now just process all lines of image and for each pixel just obtain the left and right paper cells. and linearly interpolate between those values. That should lead you to actual background color of that pixel so just substract it from image.
My C++ implementation for this looks like this:
color picture::normalize(int sz,bool _recolor,bool _sbstract) { struct _cell { color col; int a[4],da,_paper; _cell(){}; _cell(_cell& x){ *this=x; }; ~_cell(){}; _cell* operator = (const _cell *x) { *this=*x; return this; }; /*_cell* operator = (const _cell &x) { ...copy... return this; };*/ }; int i,x,y,tx,ty,txs,tys,a0[4],a1[4],n,dmax; int x0,x1,y0,y1,q[4][4][2],qx[4],qy[4]; color c; _cell **tab; // allocate grid table txs=xs/sz; tys=ys/sz; n=sz*sz; c.dd=0; if ((txs<2)||(tys<2)) return c; tab=new _cell*[tys]; for (ty=0;ty<tys;ty++) tab[ty]=new _cell[txs]; // compute grid table for (y0=0,y1=sz,ty=0;ty<tys;ty++,y0=y1,y1+=sz) for (x0=0,x1=sz,tx=0;tx<txs;tx++,x0=x1,x1+=sz) { for (i=0;i<4;i++) a0[i]=0; for (y=y0;y<y1;y++) for (x=x0;x<x1;x++) { dec_color(a1,p[y][x],pf); for (i=0;i<4;i++) a0[i]+=a1[i]; } for (i=0;i<4;i++) tab[ty][tx].a[i]=a0[i]/n; enc_color(tab[ty][tx].a,tab[ty][tx].col,pf); tab[ty][tx].da=0; for (i=0;i<4;i++) a0[i]=tab[ty][tx].a[i]; for (y=y0;y<y1;y++) for (x=x0;x<x1;x++) { dec_color(a1,p[y][x],pf); for (i=0;i<4;i++) tab[ty][tx].da+=abs(a1[i]-a0[i]); } tab[ty][tx].da/=n; } // compute max safe delta dmax = avg(delta) for (dmax=0,ty=0;ty<tys;ty++) for (tx=0;tx<txs;tx++) dmax+=tab[ty][tx].da; dmax/=(txs*tys); // select paper cells and compute avg paper color for (i=0;i<4;i++) a0[i]=0; x0=0; for (ty=0;ty<tys;ty++) for (tx=0;tx<txs;tx++) if (tab[ty][tx].da<=dmax) { tab[ty][tx]._paper=1; for (i=0;i<4;i++) a0[i]+=tab[ty][tx].a[i]; x0++; } else tab[ty][tx]._paper=0; if (x0) for (i=0;i<4;i++) a0[i]/=x0; enc_color(a0,c,pf); // remove saturated ink cells from paper (small .da but wrong .a[]) for (ty=1;ty<tys-1;ty++) for (tx=1;tx<txs-1;tx++) if (tab[ty][tx]._paper==1) if ((tab[ty][tx-1]._paper==0) ||(tab[ty][tx+1]._paper==0) ||(tab[ty-1][tx]._paper==0) ||(tab[ty+1][tx]._paper==0)) { x=0; for (i=0;i<4;i++) x+=abs(tab[ty][tx].a[i]-a0[i]); if (x>dmax) tab[ty][tx]._paper=2; } for (ty=0;ty<tys;ty++) for (tx=0;tx<txs;tx++) if (tab[ty][tx]._paper==2) tab[ty][tx]._paper=0; // piecewise linear interpolation H-lines int ty0,ty1,tx0,tx1,d; if (_sbstract) for (i=0;i<4;i++) a0[i]=0; for (y=0;y<ys;y++) { ty=y/sz; if (ty>=tys) ty=tys-1; // first paper cell for (tx=0;(tx<txs)&&(!tab[ty][tx]._paper);tx++); tx1=tx; if (tx>=txs) continue; // no paper cell found for (;tx<txs;) { // fnext paper cell for (tx++;(tx<txs)&&(!tab[ty][tx]._paper);tx++); if (tx<txs) { tx0=tx1; x0=tx0*sz; tx1=tx; x1=tx1*sz; d=x1-x0; } else x1=xs; // interpolate for (x=x0;x<x1;x++) { dec_color(a1,p[y][x],pf); for (i=0;i<4;i++) a1[i]-=tab[ty][tx0].a[i]+(((tab[ty][tx1].a[i]-tab[ty][tx0].a[i])*(x-x0))/d)-a0[i]; if (pf==_pf_s ) for (i=0;i<1;i++) clamp_s32(a1[i]); if (pf==_pf_u ) for (i=0;i<1;i++) clamp_u32(a1[i]); if (pf==_pf_ss ) for (i=0;i<2;i++) clamp_s16(a1[i]); if (pf==_pf_uu ) for (i=0;i<2;i++) clamp_u16(a1[i]); if (pf==_pf_rgba) for (i=0;i<4;i++) clamp_u8 (a1[i]); enc_color(a1,p[y][x],pf); } } } // recolor paper cells with avg color (remove noise) if (_recolor) for (y0=0,y1=sz,ty=0;ty<tys;ty++,y0=y1,y1+=sz) for (x0=0,x1=sz,tx=0;tx<txs;tx++,x0=x1,x1+=sz) if (tab[ty][tx]._paper) for (y=y0;y<y1;y++) for (x=x0;x<x1;x++) p[y][x]=c; // free grid table for (ty=0;ty<tys;ty++) delete[] tab[ty]; delete[] tab; return c; }
See the linked answer for more details. Here result for your input image after switching to gray-scale
<0,765>
and usingpic1.normalize(8,false,true);
Binarize
I tried naive simple range tresholding first so if all color channel values (R,G,B) are in range
<min,max>
it is recolored toc1
else toc0
:void picture::treshold_AND(int min,int max,int c0,int c1) // all channels tresholding: c1 <min,max>, c0 (-inf,min)+(max,+inf) { int x,y,i,a[4],e; for (y=0;y<ys;y++) for (x=0;x<xs;x++) { dec_color(a,p[y][x],pf); for (e=1,i=0;i<3;i++) if ((a[i]<min)||(a[i]>max)){ e=0; break; } if (e) for (i=0;i<4;i++) a[i]=c1; else for (i=0;i<4;i++) a[i]=c0; enc_color(a,p[y][x],pf); } }
after applying
pic1.treshold_AND(0,127,765,0);
and converting back to RGBA I got this result:The gray noise is due to JPEG compression (PNG would be too big). As you can see the result is more or less acceptable.
In case this is not enough you can divide your image into segments. Compute histogram for each segment (it should be bimodal) then find the color between the 2 maximums which is your treshold value. The problem is that the background covers much more area so the ink peak is relatively small and sometimes hard to spot in linear scales see full image histogram:
When you do this for each segment it will be much better (as there will be much less background/text color bleedings around the tresholds) so the gap will be more visible. Also do not forget to ignore the small gaps (missing vertical lines in the histogram) as they are just related to quantization/encoding/rounding (not all gray shades are present in the image) so you should filter out gaps smaller then few intensities replacing them with avg of last and next valid histogram entry.
这篇关于OpenCV for OCR:如何计算灰度图像OCR的阈值水平的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!