Become a PDF professional from the Ubuntu terminal

Become a pdf professional from the Ubuntu terminal

Text documents are quite common documents among Gnu / Linux users and the computer world ... and with it all the commands and programs related to it. But nowadays, files in pdf format are gaining ground over text documents, being the favorites for many users, developers and projects.

If we use a graphical environment, using and managing a pdf file is easy, But What if we use the terminal? Next we tell you how to manipulate pdf files and search for words, count the characters in the text of the pdf file, etc ...

For this we are going to do use of the pdfgrep command, a command that is a fork of the grep command. Pdfgrep allows us to create pdf documents, send information to the created file or simply search for a word within a pdf document.

Pdfgrep is a tool that we can find in the official repositories of almost all distributions, so for its installation we only have to use the software manager of the distribution and install it. It may happen that our distribution does not contain it, (something strange if we use Ubuntu). In that case we go to the official website from the developer and we will get the deb or rpm package to install.

Once we have it installed, the operation must be as follows:

pdfgrep [-v] pattern [archivo.pdf]

In this case, both pdfgrep and pattern are fixed commands and [-v] is the variable part that we will use to perform operations with pdf files, such as searching for words, counting characters, etc ... The [file.pdf] has to be changed to the name of the file that we want to use or create. If it is in the same folder where we are, there will be no problem, but if the pdf file is in another part of the computer, we must indicate the address of the pdf file since otherwise there will be an error.

If you really use the grep command in the terminal, you will love the pdfgrep command. A tool that will allow us generate pdf files with the information of our team and to be able to send it to a friend, a technician or any other similar use.

Leave a Comment Cancel reply

Giovanni gapp said
ago 6 years

They continue to help me with the BIOS error that Ubuntu caused, canonical abandoned us and pretends to forget us, they damaged my new computer

Reply to Giovanni Gapp
1. Don Quixote said
  ago 6 years
  
  and that perhaps you are stupid, you piece of troll that you do not understand that this blog does not belong to canonical damn subnormal, every time I see the blog you are commenting bullshit go shit somewhere else
  
  Reply to donquijote
Jimmy olano said
ago 6 years

I just installed the following version on my Ubuntu 16.04:

"This is pdfgrep version 1.4.1.

Using poppler version 0.41.0
Using libpcre version 8.41 2017-07-05 »

I got that with the –V (or –version) parameter BUT WITH THE -v PARAMETER IT TELLS ME THAT IT DOES NOT RECOGNIZE IT.

To all these I find the command -io –ignore-case to be more useful, which returns the keyword that we pass to it in its search in either uppercase or lowercase.

HOWEVER, IT HAS A SERIOUS PROBLEM TO SEARCH ACCENTED WORDS AND OUR DEAR EÑE LETTER, if we want to search for «production» or «protection» we must look for:

pdfgrep -i producc file_name.pdf
pdfgrep -i protect filename.pdf

(I already tried enclosing it in quotes, single and double, the C language escape character "\" and wildcard characters and nothing at all). To search for the keyword "year", the truth is that I can't think of any alternative, whoever knows something please post here and please answer me.

THE MOST POWERFUL OPTION IS -ro –recursive: it looks for the word in ALL the pdf documents that we have in the directory that we are working on.

In summary, it is a good tool and since it is written in free software, we can modify it so that it supports the Spanish language, thanks for the article!

Reply to Jimmy Olano
Jimmy olano said
ago 6 years

READING THIS DOCUMENT:

https://pdfgrep.org/doc.html

I find out and let you know that it is proposed to add the parameter «–unac» to handle the accented characters HOWEVER, the version I downloaded did not have unac support because it was simply not compiled with that utility, which they call experimental by the way.
The funny thing is that the grep command does not have that limitation, even when using the -i parameter with grep one can search for "ú" and it will also return "Ú".

In any case I am already reviewing the pdfgrep repository to see what else I learn about it, it is worth not bothering you anymore (for today).

Reply to Jimmy Olano