Optical Character Recognition using Python and Google Tesseract OCR

In this article, we will install Tesseract OCR on our system, verify the Installation and try Tesseract on some of the sample images.

TL; DR

Time needed: 45 minutes.

In order to decompile an application, you will need to perform the following steps.

  1. Install Tesseract OCR on your computer

    macOS users, run brew install tesseract.
    Linux users, run sudo apt-get install tesseract-ocr
    Windows users, consult tesseract documentation to install the binary. For detailed steps, continue reading the blog.

  2. Verify the Installation of Tesseract on your machine

    Run tesseract -v to verify the installation. If the command prints the version properly, then we are good to go!

  3. Create a new file named ocr.py

    Create a new file called ocr_main.py and copy the contents from the detailed blog.

  4. Run the python script

    Run the script using python ocr_main.py

Detailed Steps

Step One – Installing Tesseract OCR

For macOS users, we’ll be using Homebrew to install Tesseract:

brew install tesseract

If you’re using the Ubuntu operating system, simply use apt-get  to install Tesseract OCR:

sudo apt-get install tesseract-ocr

For Windows, please consult Tesseract documentation

Step Two – Verifying the Installation of Tesseract OCR

To validate that Tesseract has been successfully installed on your machine, execute the following commands:

tesseract -v

You should see the Tesseract version printed on your screen, along with a list of image file format libraries Tesseract is compatible with. For example,

tesseract 3.05.01
leptonica-1.74.1
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0

If the Tesseract version is not displayed on your screen, a blank window may be opened and closed automatically.

If you get errors instead, then re-install Tesseract and make sure you update your PATH variable and try to open the console or the IDE which you are using with Administrative Privileges.

Step Three – Testing out Tesseract OCR

In order to obtain reasonable results, you need to supply images that are cleanly pre-processed and crisp.

Recommendations:

  • Use images with high resolution and DPI possible.
  • Make sure that the text is clearly visible and with no pixelations or deformations.

The GitHub repository for this tutorial will be available here.

Let’s start coding now:

Create a file named ocr_main.py (I chose it, you can name it whatever you want)

1. Import necessary libraries

import cv2
import pytesseract
from PIL import Image

2. Get the path of the image file we are working on. I’m going to store the path to the file in a variable called path

# Get File Name from Command Line
path = input("Enter the file path : ").strip()

3. Load the image data and store it in the variable image

# load the image
image = cv2.imread(path)

4. Convert the image to grayscale for better recognition of text and store the data in gray

# Converting to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

5. If you want to pre-process your image, then do it accordingly.

temp = input("Do you want to pre-process the image ?nThreshold : 1nGrey : 2nNone : 0nEnter your choice : ").strip()
# If user enter 1, Process Threshold or if user enters 2, then process medianBlur. Else, do nothing.
if temp == "1":
    gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
elif temp == "2":
    gray = cv2.medianBlur(gray, 3)

6. Save the pre-processed temporary file as temp.png

filename = "{}.png".format("temp")
cv2.imwrite(filename, gray)

7. Apply OCR and print the output string.

text = pytesseract.image_to_string(Image.open(filename))
print(text)

And the final code will be :

import cv2
import pytesseract
from PIL import Image

def main():
    # Get File Name from Command Line
    path = input("Enter the file path : ").strip()
    # load the image

    image = cv2.imread(path)
    # Convert image to grayscale

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    temp = input("Do you want to pre-process the image ?nThreshold : 1nGrey : 2nNone : 0nEnter your choice : ").strip()

     # If user enter 1, Process Threshold
     if temp == "1":
         gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
     elif temp == "2":
         gray = cv2.medianBlur(gray, 3)

     # store grayscale image as a temp file to apply OCR

     filename = "{}.png".format("temp")

     cv2.imwrite(filename, gray)

     # load the image as a PIL/Pillow image, apply OCR, and then delete the temporary file

     text = pytesseract.image_to_string(Image.open(filename))

     print(text)

 try:
     main()
 except Exception as e:
     print(e.args) print(e.__cause__)

Step Four: Let’s put our code to Test OCR

Here are some of the sample pictures to test Tesseract.

Before testing out tesseract, I recommend you to download the GitHub Repository from here

Text in bold represents output and the italic text indicates input.

Let’s try it on the first sample.

Sample 1

python ocr_main.py
Enter the file path: sample1.png
Do you want to pre-process the image?
Threshold: 1
Grey: 2
None : 3
Enter your choice: 1
You are awesome.

It works well on Sample Image 1, let’s try it on Sample Image 2.

Sample 2

python ocr_main.py
Enter the file path: sample1.png
Do you want to pre-process the image?
Threshold: 1
Grey: 2
None : 3
Enter your choice: 1
Some italic text.

And finally on the last sample.

Sample 3

python ocr_main.py
Enter the file path: sample1.png
Do you want to pre-process the image?
Threshold: 1
Grey: 2
None : 3
Enter your choice: 1
Hawdwriting

Thanks for taking time for reading this article, A big thumbs up for you people.

If you have any queries regarding this article, I would be glad to help you out. Please let me know in the comments section below 🙂

Author avatar
Anirudh Mergu
https://anirudhmergu.com
A designer by heart. An engineer by profession.

20 comments

  1. Anonymous

    cool stuff nice job

  2. Ditiya Mukherjee

    getting error as (“module ‘cv2’ has no attribute ‘imread'”,)

  3. Luís Cunha

    Hello, program starts smoothly , but after selecting the pre-proccess option the following error appears :

    (“module ‘pytesseract’ has no attribute ‘image_to_string'”,)

    None

    can you help ?

  4. Karan Davda

    if i want to convert image to any other colour then?? or if not want to convert image into gray then?? what should i do ??

  5. Nitin Kshatriya

    Just a quick question, how can I use the about model for mobile. Apart from using API, is there way to use them in IOS/Android devices?

  6. abraham

    Nice information bro. i saw few posts…….keep rocking.

  7. abraham

    Nice information bro. i saw few posts…….keep rocking.

  8. Ajeet

    In my image I have got a value like 60-70 mg but OCR converts this as 607is70 mg , is there a fix this kind of issues.

    • The results depend on the quality of the image, kindly use an image with a better resolution and use the pre-process methods to clean the clutter from the image. Hope this solution solves the issue.

  9. Ajeet

    In my image I have got a value like 60-70 mg but OCR converts this as 607is70 mg , is there a fix this kind of issues.

    • The results depend on the quality of the image, kindly use an image with a better resolution and use the pre-process methods to clean the clutter from the image. Hope this solution solves the issue.

  10. I would like to retrieve data from a structured form into an excel sheet which has 2 columns. 1st column contains indicates the name of the field. 2nd column indicates the value of the field. How can I do it

    • Just split the image containing data into two parts vertically.
      Run OCR on each of the two images and store them in two different lists. Say names and values
      As every record will be separated by an empty line character i.e. '\n'. You can split them using names.split("\n") and values.split("\n")
      This will give you an array of strings

      Create a new string, say output = ""

      Then write some code to take each record simultaneously from both the arrays and append it to the output string as output += str(name)+","+str(value).

      Create a file buffer. For easiness, I recommend using CSV and then convert open it in excel and save it as a new excel file.

      f = open("file.csv", "w+")

      Write the output string to file.

      f.write(output)

      Close the output stream

      f.close()

      Then open this file in excel and then save as a new excel file.

      Hope it helps!

      Thanks,
      Anirudh

  11. Adi Sankuri

    I would like to retrieve data from a structured form into an excel sheet which has 2 columns. 1st column contains indicates the name of the field. 2nd column indicates the value of the field. How can I do it

    • Just split the image containing data into two parts vertically.
      Run OCR on each of the two images and store them in two different lists. Say names and values
      As every record will be separated by an empty line character i.e. 'n'. You can split them using names.split("n") and values.split("n")
      This will give you an array of strings

      Create a new string, say output = ""

      Then write some code to take each record simultaneously from both the arrays and append it to the output string as output += str(name)+","+str(value).

      Create a file buffer. For easiness, I recommend using CSV and then convert open it in excel and save it as a new excel file.

      f = open("file.csv", "w+")

      Write the output string to file.

      f.write(output)

      Close the output stream

      f.close()

      Then open this file in excel and then save as a new excel file.

      Hope it helps!

      Thanks,
      Anirudh

Join the discussion