In this article, we will install Tesseract OCR on our system, verify the Installation and try Tesseract on some of the sample images.
TL; DR
Time needed: 45 minutes.
In order to decompile an application, you will need to perform the following steps.
- Install Tesseract OCR on your computer
macOS users, run brew install tesseract.
Linux users, run sudo apt-get install tesseract-ocr
Windows users, consult tesseract documentation to install the binary. For detailed steps, continue reading the blog. - Verify the Installation of Tesseract on your machine
Run tesseract -v to verify the installation. If the command prints the version properly, then we are good to go!
- Create a new file named ocr.py
Create a new file called ocr_main.py and copy the contents from the detailed blog.
- Run the python script
Run the script using python ocr_main.py
Detailed Steps
Step One – Installing Tesseract OCR
For macOS users, we’ll be using Homebrew to install Tesseract:
brew install tesseract
If you’re using the Ubuntu operating system, simply use apt-get to install Tesseract OCR:
sudo apt-get install tesseract-ocr
For Windows, please consult Tesseract documentation
Step Two – Verifying the Installation of Tesseract OCR
To validate that Tesseract has been successfully installed on your machine, execute the following commands:
tesseract -v
You should see the Tesseract version printed on your screen, along with a list of image file format libraries Tesseract is compatible with. For example,
tesseract 3.05.01 leptonica-1.74.1 libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.0
If the Tesseract version is not displayed on your screen, a blank window may be opened and closed automatically.
If you get errors instead, then re-install Tesseract and make sure you update your PATH variable and try to open the console or the IDE which you are using with Administrative Privileges.
Step Three – Testing out Tesseract OCR
In order to obtain reasonable results, you need to supply images that are cleanly pre-processed and crisp.
Recommendations:
- Use images with high resolution and DPI possible.
- Make sure that the text is clearly visible and with no pixelations or deformations.
The GitHub repository for this tutorial will be available here.
Let’s start coding now:
Create a file named ocr_main.py
(I chose it, you can name it whatever you want)
1. Import necessary libraries
import cv2 import pytesseract from PIL import Image
2. Get the path of the image file we are working on. I’m going to store the path to the file in a variable called path
# Get File Name from Command Line path = input("Enter the file path : ").strip()
3. Load the image data and store it in the variable image
# load the image image = cv2.imread(path)
4. Convert the image to grayscale for better recognition of text and store the data in gray
# Converting to grayscale gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
5. If you want to pre-process your image, then do it accordingly.
temp = input("Do you want to pre-process the image ?nThreshold : 1nGrey : 2nNone : 0nEnter your choice : ").strip() # If user enter 1, Process Threshold or if user enters 2, then process medianBlur. Else, do nothing. if temp == "1": gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1] elif temp == "2": gray = cv2.medianBlur(gray, 3)
6. Save the pre-processed temporary file as temp.png
filename = "{}.png".format("temp") cv2.imwrite(filename, gray)
7. Apply OCR and print the output string.
text = pytesseract.image_to_string(Image.open(filename)) print(text)
And the final code will be :
import cv2 import pytesseract from PIL import Image def main(): # Get File Name from Command Line path = input("Enter the file path : ").strip() # load the image image = cv2.imread(path) # Convert image to grayscale gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) temp = input("Do you want to pre-process the image ?nThreshold : 1nGrey : 2nNone : 0nEnter your choice : ").strip() # If user enter 1, Process Threshold if temp == "1": gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1] elif temp == "2": gray = cv2.medianBlur(gray, 3) # store grayscale image as a temp file to apply OCR filename = "{}.png".format("temp") cv2.imwrite(filename, gray) # load the image as a PIL/Pillow image, apply OCR, and then delete the temporary file text = pytesseract.image_to_string(Image.open(filename)) print(text) try: main() except Exception as e: print(e.args) print(e.__cause__)
Step Four: Let’s put our code to Test OCR
Here are some of the sample pictures to test Tesseract.
Before testing out tesseract, I recommend you to download the GitHub Repository from here
Text in bold represents output and the italic text indicates input.
Let’s try it on the first sample.
Sample 1 python ocr_main.py Enter the file path: sample1.png Do you want to pre-process the image? Threshold: 1 Grey: 2 None : 3 Enter your choice: 1 You are awesome.
It works well on Sample Image 1, let’s try it on Sample Image 2.
Sample 2 python ocr_main.py Enter the file path: sample1.png Do you want to pre-process the image? Threshold: 1 Grey: 2 None : 3 Enter your choice: 1 Some italic text.
And finally on the last sample.
Sample 3 python ocr_main.py Enter the file path: sample1.png Do you want to pre-process the image? Threshold: 1 Grey: 2 None : 3 Enter your choice: 1 Hawdwriting
Thanks for taking time for reading this article, A big thumbs up for you people.
If you have any queries regarding this article, I would be glad to help you out. Please let me know in the comments section below 🙂
20 comments
I would like to retrieve data from a structured form into an excel sheet which has 2 columns. 1st column contains indicates the name of the field. 2nd column indicates the value of the field. How can I do it
Just split the image containing data into two parts vertically.
Run OCR on each of the two images and store them in two different lists. Say
names
andvalues
As every record will be separated by an empty line character i.e.
'n'
. You can split them usingnames.split("n")
andvalues.split("n")
This will give you an array of strings
Create a new string, say
output = ""
Then write some code to take each record simultaneously from both the arrays and append it to the output string as
output += str(name)+","+str(value)
.Create a file buffer. For easiness, I recommend using CSV and then convert open it in excel and save it as a new excel file.
f = open("file.csv", "w+")
Write the output string to file.
f.write(output)
Close the output stream
f.close()
Then open this file in excel and then save as a new excel file.
Hope it helps!
Thanks,
Anirudh
I would like to retrieve data from a structured form into an excel sheet which has 2 columns. 1st column contains indicates the name of the field. 2nd column indicates the value of the field. How can I do it
Just split the image containing data into two parts vertically.
Run OCR on each of the two images and store them in two different lists. Say
names
andvalues
As every record will be separated by an empty line character i.e.
'\n'
. You can split them usingnames.split("\n")
andvalues.split("\n")
This will give you an array of strings
Create a new string, say
output = ""
Then write some code to take each record simultaneously from both the arrays and append it to the output string as
output += str(name)+","+str(value)
.Create a file buffer. For easiness, I recommend using CSV and then convert open it in excel and save it as a new excel file.
f = open("file.csv", "w+")
Write the output string to file.
f.write(output)
Close the output stream
f.close()
Then open this file in excel and then save as a new excel file.
Hope it helps!
Thanks,
Anirudh
In my image I have got a value like 60-70 mg but OCR converts this as 607is70 mg , is there a fix this kind of issues.
The results depend on the quality of the image, kindly use an image with a better resolution and use the pre-process methods to clean the clutter from the image. Hope this solution solves the issue.
In my image I have got a value like 60-70 mg but OCR converts this as 607is70 mg , is there a fix this kind of issues.
The results depend on the quality of the image, kindly use an image with a better resolution and use the pre-process methods to clean the clutter from the image. Hope this solution solves the issue.
Nice information bro. i saw few posts…….keep rocking.
Thanks Abraham! This means a lot! ❤️
Nice information bro. i saw few posts…….keep rocking.
Thanks Abraham! This means a lot! ❤️
Just a quick question, how can I use the about model for mobile. Apart from using API, is there way to use them in IOS/Android devices?
You may consider using these repositories for more details:
Android: https://github.com/rmtheis/tess-two
iOS: https://github.com/gali8/Tesseract-OCR-iOS
Hope it helps!
if i want to convert image to any other colour then?? or if not want to convert image into gray then?? what should i do ??
You can try by removing this statement from the code
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
it genrate error
error : (“OpenCV(4.1.0) /io/opencv/modules/imgproc/src/thresh.cpp:1509: error: (-215:Assertion failed) src.type() == CV_8UC1 in function ‘threshold’n”,)
None
apart this i keep that statement as it is and i try by changing value of threshold , but there is no change. this grayscale make image so dark https://uploads.disquscdn.com/images/3a27939653465fd88149ca20b9bcb59a2e3c45376194637f29362b41a25cd236.png
Hello, program starts smoothly , but after selecting the pre-proccess option the following error appears :
(“module ‘pytesseract’ has no attribute ‘image_to_string'”,)
None
can you help ?
getting error as (“module ‘cv2’ has no attribute ‘imread'”,)
cool stuff nice job