Tesseract

Lowyat.NET forums

Lowyat.NET Kopitiam Garage Sales

Lowyat.NET Rules and Regulations FAQ Help Search Members

Welcome Guest ( Log In | Register )

Lowyat.NET -> Codemasters

Bump Topic Add Reply RSS Feed

Outline · [ Standard ] · Linear+

Tesseract, OCR

views

narf03	Sep 29 2015, 11:21 PM Return to original view \| Post #1
Look at all my stars!! Senior Member 4,547 posts Joined: Dec 2004 From: Metro Prima, Kuala Lumpur, Malaysia, Earth, Sol	Attempt to do that a while ago, but failed, many of the OCR try to recognize dictionary words, so if you do number plates, name, etc will be big failure. It doesnt really matter much if you change font.
Card PM	Report Top Like Quote Reply

narf03	Sep 30 2015, 03:55 PM Return to original view \| Post #2
Look at all my stars!! Senior Member 4,547 posts Joined: Dec 2004 From: Metro Prima, Kuala Lumpur, Malaysia, Earth, Sol	QUOTE(zeb kew @ Sep 30 2015, 10:21 AM) I think there is a way to force Tesseract to limit it to only some characters. And IIRC, Tesseract does not use dictionary lookup/spellcheck, which is why it's recognition rate for normal text is pretty poor. Can't remember the command line option now. We used Tesseract a couple of years back in a project to scan receipts. The OCR was only to pick out the receipt number so that the image file can be stored with the appropriate filename/receipt number. With additional code to pick out the receipt number and clean up the area around it, the error rate was less than 10 per 10,000 receipts. A lot depends on your image quality. And there is an optimal size for the characters (in number of pixels). It's in the documentation. Too small or too large, and the recognition rate drops. From wiki https://en.wikipedia.org/wiki/Tesseract_(software) QUOTE The initial versions of Tesseract could only recognize English language text. Starting with version 2 Tesseract was able to process English, French, Italian, German, Spanish, Brazilian Portuguese and Dutch. Starting with version 3 it can recognize Arabic, Bulgarian, Catalan, Chinese (Simplified and Traditional), Croatian, Czech, Danish, Dutch, English, German (standard and Fraktur script), Greek, Finnish, French, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak (standard and Fraktur script), Slovenian, Spanish, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian and Vietnamese. Tesseract can be trained to work in other languages too.
Card PM	Report Top Like Quote Reply

« Next Oldest · Codemasters · Next Newest »

Add Reply Options

Change to:

0.0171sec

0.74

7 queries

GZIP Disabled
Time is now: 14th December 2025 - 04:10 PM

All Rights Reserved © 2002- 2025 Vijandren Ramadass (~unite against racism~)

Removal Request

Powered by Invision Power Board © 2025 IPS, Inc.