In the next article we are going to take a look at gImageReader. This is an app front end for engine Tesseract OCR. For those who do not know Tesseract, say that it is an optical character recognition (OCR) engine that uses artificial intelligence to search and recognize text printed on images. It is an open source library and one of the most popular OCR engines on the market. Simplify the entire process of extracting printed text from images allowing users to work with files, scanned images, PDFs, pasted clipboard items, etc.
Today all users, whether in offices, homes, etc., we can find ourselves in a situation in which we need to extract text from an image. It could be a scanned document in image format, a piece of paper, or an old research paper. The option that many users would take would be to type all the text using an editor, but this process can be time consuming. To avoid this work, we can also opt for the option of use an OCR to extract the text automatically.
gImageReader will offer us many functions and tools. This application is a good tool to use after importing a PDF or the scanned document and its further processing.
GImageReader General Features
- We will be able import PDF documents and images from disk, scanning devices, clipboard and screenshots. gImageReader supports many types of files. We will simply have to import our files to the tool and extract text with one click.
- We will have the possibility of generate PDF documents from hOCR documents. gImageReader supports three formats of extracted text, plain text, PDF, and hOCR format.
- The tool will give us the possibility of define a manual or automatic recognition area to select the text to extract.
- The recognized text displayed directly next to the image. As you can see in the above screenshot.
- After extracting to plain text, gImageReader performs post-processing actions, such as spell check. Depending on the language we choose (the default is All English), will underline words that have grammatical errors. In addition, gImageReader allows us to select the page segmentation mode that we want to use for the extracted text.
- Unlike other OCR tools where we can work with one file at a time, gImageReader supports the import of numerous files and batch processings.
About this program we can get more information or any new update on their official page GitHub.
Installation on Ubuntu
This is one multiplatform application and it works on both Gnu / Linux and Windows. In the following lines we will see the gImageReader installation process in Ubuntu 18.04 as indicated in the project's GitHub page.
Add the PPA
To have this software we will need add the PPA repository to our system. We will do this by opening a terminal (Ctrl + Alt + T) and typing the following command:
sudo add-apt-repository ppa:sandromani/gimagereader
Install gImageReader
After the software update available, we can now proceed to install the application typing in the same terminal:
sudo apt-get install gimagereader tesseract-ocr tesseract-ocr-eng
With all of the above, gImageReader should install on your Ubuntu. Now we should be able to start the program on our computer.
uninstall
In case we want uninstall gImageReaderIn a terminal (Ctrl + Alt + T) we will only have to use the following command:
sudo apt-get remove gimagereader -y
To finish eliminating the program, we can also execute:
sudo apt-get autoremove
The PPA that we use for the installation can be eliminated from our system by typing in the same terminal:
sudo add-apt-repository -r ppa:sandromani/gimagereader
gImageReader is a simple front-end Gtk / Qt for tesseract-ocr that comes simplifying the entire process of extracting printed text from images. It will allow us to work with files, scanned images, PDF, pasted clipboard items, etc. This makes it a good option to get the text out of our images easily and quickly.