


I'm curious about how I may be able to more reliably recognise the value and the suit of playing card images. Here are two examples:

图像中可能会有一些噪点,但是我有大量的图像数据集可用于训练(大约10k png,包括所有值和西服).

There may be some noise in the images, but I have a large dataset of images that I could use for training (roughly 10k pngs, including all values & suits).


I can reliably recognise images that I've manually classified, if I have a known exact-match using a hashing method. But since I'm hashing images based on their content, then the slightest noise changes the hash and results in an image being treated as unknown. This is what I'm looking to reliably address with further automation.

我一直在审查有关培训tesseract的3.05文档: https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#automated-method

I've been reviewing the 3.05 documentation on training tesseract:https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#automated-method


Can tesseract only be trained with images found in fonts? Or could I use it to recognise the suits for these cards?


I was hoping that I could say that all images in this folder correspond to 4c (e.g. the example images above), and that tesseract would see the similarity in any future instances of that image (regardless of noise) and also read that as 4c. Is this possible? Does anyone here have experience with this?



This has been my non-tesseract solution to this, until someone proves there's a better way. I've setup:


Getting these to running was the hardest part. Next, I used my dataset to train a new caffe network. I prepared my dataset into a single depth folder structure:



Within Digits, I chose:

  1. 数据集"标签
  2. 新数据集图像
  3. 分类
  4. 我将其指向我的卡文件夹,例如:/path/to/card
  5. 根据此处的讨论,我将验证百分比设置为13.0%: https://stackoverflow.com/a/13612921/880837
  6. 创建数据集后,我打开了模型"选项卡
  7. 选择我的新数据集.
  8. 选择标准网络"下的GoogLeNet,然后将其留给培训.

每次在数据集中有新图像时,我都会这样做几次.每次学习课程需要6到10个小时,但是在这一阶段,我可以使用以下逻辑使用caffemodel以编程方式估算每个图像的期望值: https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp

I did this several times, each time I had new images in the dataset. Each learning session took 6-10 hours, but at this stage I can use my caffemodel to programmatically estimate what each image is expected to be, using this logic:https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp


The results are either a card (2c, 7h, etc), noise, or table. Any estimates with an accuracy bigger than 90% are most likely correct. The latest run correctly recognised 300 out of 400 images, with only 3 mistakes. I'm adding new images to the dataset and retraining the existing model, further tuning the result accuracy. Hope this is valuable to others!

尽管我想在这里进行高级操作,但由于大卫·汉弗莱(David Humphrey)和他在github上的帖子,所有这些工作都得到了很大的回报,我真的建议您阅读并尝试一下,如果您有兴趣了解更多信息,请尝试: https://github.com/humphd/have-fun-with-machine-learning

While I wanted the high level steps here, this was all done with large thanks to David Humphrey and his github post, I really recommend reading it and trying it out if you're interested in learning more: https://github.com/humphd/have-fun-with-machine-learning


07-23 11:04