Attempt to do that a while ago, but failed, many of the OCR try to recognize dictionary words, so if you do number plates, name, etc will be big failure. It doesnt really matter much if you change font.
Tesseract, OCR
Tesseract, OCR
|
|
Sep 29 2015, 11:21 PM
Return to original view | Post
#1
|
![]() ![]() ![]() ![]() ![]() ![]() ![]()
Senior Member
4,547 posts Joined: Dec 2004 From: Metro Prima, Kuala Lumpur, Malaysia, Earth, Sol |
Attempt to do that a while ago, but failed, many of the OCR try to recognize dictionary words, so if you do number plates, name, etc will be big failure. It doesnt really matter much if you change font.
|
|
|
Sep 30 2015, 03:55 PM
Return to original view | Post
#2
|
![]() ![]() ![]() ![]() ![]() ![]() ![]()
Senior Member
4,547 posts Joined: Dec 2004 From: Metro Prima, Kuala Lumpur, Malaysia, Earth, Sol |
QUOTE(zeb kew @ Sep 30 2015, 10:21 AM) I think there is a way to force Tesseract to limit it to only some characters. From wikiAnd IIRC, Tesseract does not use dictionary lookup/spellcheck, which is why it's recognition rate for normal text is pretty poor. Can't remember the command line option now. We used Tesseract a couple of years back in a project to scan receipts. The OCR was only to pick out the receipt number so that the image file can be stored with the appropriate filename/receipt number. With additional code to pick out the receipt number and clean up the area around it, the error rate was less than 10 per 10,000 receipts. A lot depends on your image quality. And there is an optimal size for the characters (in number of pixels). It's in the documentation. Too small or too large, and the recognition rate drops. https://en.wikipedia.org/wiki/Tesseract_(software) QUOTE The initial versions of Tesseract could only recognize English language text. Starting with version 2 Tesseract was able to process English, French, Italian, German, Spanish, Brazilian Portuguese and Dutch. Starting with version 3 it can recognize Arabic, Bulgarian, Catalan, Chinese (Simplified and Traditional), Croatian, Czech, Danish, Dutch, English, German (standard and Fraktur script), Greek, Finnish, French, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak (standard and Fraktur script), Slovenian, Spanish, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian and Vietnamese. Tesseract can be trained to work in other languages too. |
| Change to: | 0.0171sec
0.74
7 queries
GZIP Disabled
Time is now: 14th December 2025 - 04:10 PM |