How to Convert JPG to Text Using Python?

Faraz

By Faraz - August 10, 2024

Learn how to convert JPG to text in Python with easy-to-follow steps. This beginner-friendly guide uses Python libraries like Pillow and Pytesseract.


jpg-to-text-using-python-steb-by-step-tutorial.webp

Converting an image like a JPG file to text might seem complicated, but with Python, it's quite straightforward. Python has powerful libraries that can handle this task with ease. In this blog, we will guide you through the steps to create a Python script that extracts text from a JPG image. Whether you're a beginner or have some coding experience, this tutorial will be easy to follow. Let's dive in and learn how to convert JPG to text using Python!

Step-by-Step Guide to Convert JPG to Text in Python

Step 1: Install Required Python Libraries

To begin, you need to install the necessary Python libraries. We will use Pillow to handle the image and pytesseract to extract the text.

Open your terminal or command prompt and run the following command:

pip install pillow pytesseract

Step 2: Install Tesseract-OCR

Tesseract-OCR is the engine that processes the image and extracts the text. You need to install it separately on your system.

  • Windows: Download and install Tesseract from here.
  • Linux: Install Tesseract using the package manager. On Ubuntu, use the command:
    sudo apt-get install tesseract-ocr
  • macOS: Install Tesseract using Homebrew:
    brew install tesseract

Step 3: Specify the Path to Tesseract Executable (For Windows Users)

If you're using Windows, you need to specify the path to the Tesseract executable in your Python script. This step is essential for pytesseract to work correctly.

Find the location where Tesseract was installed on your system. It's usually something like:

C:\Program Files\Tesseract-OCR\tesseract.exe

In your Python script, add the following line to set the path:

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

This line should be added before you call any pytesseract functions.

Step 4: Write the Python Code

Now, let’s write the Python script to convert JPG to text. Create a new Python file and add the following code:

from PIL import Image
import pytesseract

# If you are using Windows, specify the path to Tesseract executable
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# Load the image
image = Image.open('quote.jpg')

# Extract text from image
text = pytesseract.image_to_string(image)

# Print the text
print(text)

Step 5: Test the Code

Save your script and run it. The text extracted from the image will be displayed in the console.

python your_script_name.py

Step 6: Customize and Improve

You can further enhance your script by adding more functionalities. For example, you could:

  • Process multiple images in a loop.
  • Save the extracted text to a file.
  • Pre-process the image to improve OCR accuracy (e.g., convert it to grayscale, resize, or sharpen the image).

Here’s a simple example of saving the text to a file:

with open('output.txt', 'w') as file:
    file.write(text)

Final Thoughts

In this blog, we walked through a simple way to convert JPG images to text using Python. By leveraging the power of the Pillow and pytesseract libraries, you can easily extract text from images. This process can be handy in various applications, from automating data entry tasks to processing scanned documents.

With the steps outlined in this guide, you should now be able to create your own Python script for converting images to text. Try customizing the script further to suit your specific needs, and don't forget to explore more Python libraries that can enhance your projects.

If you found this guide helpful, be sure to share it and stay tuned for more Python tutorials!

That’s a wrap!

I hope you enjoyed this article

Did you like it? Let me know in the comments below 🔥 and you can support me by buying me a coffee.

And don’t forget to sign up to our email newsletter so you can get useful content like this sent right to your inbox!

Thanks!
Faraz 😊

End of the article

Subscribe to my Newsletter

Get the latest posts delivered right to your inbox


Latest Post

Please allow ads on our site🥺