Pdftotext, Guqula iPDF ibe ngumbhalo ukusuka kwisiphelo sendlela

malunga pdftotext

Kwinqaku elilandelayo siza kujonga i-pdftotext. Lo ngumthombo ovulelekileyo wokuyalela ulayini oya kuthi usivumele Guqula iifayile zePDF kwiifayile ezibhaliweyo. Ngokusisiseko into eyenzayo kukhupha idatha yokubhaliweyo kwiifayile zePDF. Le software isimahla kwaye ibandakanyiwe ngokungagqibekanga kunikezelo lweGnu / Linux.

Kule migca ilandelayo siza kubona isixhobo kwisiphelo sendlela, kodwa ngenjongo enye yokukhupha isicatshulwa kwiifayile zePDF ungasebenzisa isixhobo sokuzoba njenge I-Caliber. Kubalulekile ukuba uqaphele ukuba zombini isixhobo sokuzoba kunye nenye esinokuyisebenzisa kwisiphelo sendlela, abanako ukukhupha isicatshulwa ukuba iPDF yenziwe ngemifanekiso (iifoto, imifanekiso yeencwadi ezifundwayo, njl.).

Kuninzi lonikezelo lweGnu / Linux, pdftotext ibandakanyiwe njengenxalenye yephoppler-utils package. Esi sixhobo sisixhobo somgca wokuyalela Guqula iifayile zePDF zibe ngumbhalo ocacileyo. Kuyo siya kufumana ukhetho oluninzi olukhoyo, kubandakanya ukubanakho ukukhankanya uluhlu lwamaphepha oza kuguqula, ukubanakho ukugcina ubume bokwenyani besicatshulwa ngokusemandleni, usete ukuphela kwemigca, kwaye usebenze kunye neefayile ezikhuselweyo zePDF. .

malunga nokususa iphasiwedi eyaziwayo kwi-pdf
Inqaku elidibeneyo:
Susa iphasiwedi eyaziwayo kwifayile yePDF e-Ubutu

Faka i-pdftotext kwi-Ubuntu

Ukufaka esi sixhobo kwinkqubo yethu yoBuntu, ukuba awukabinayo le fayile, kufuneka uvule i-terminal (Ctrl + Alt + T) kwaye ubhale lo myalelo ulandelayo kuyo faka izixhobo ezisetyenziswayo:

fakela izixhobo ezisetyenziswayo

sudo apt install poppler-utils

Uyisebenzisa njani i-pdftotext

Guqula ifayile yePDF ibe ngumbhalo

Nje ukuba iphakheji efakwe kwinkqubo yethu yokusebenza, sinokuguqula ifayile yePDF ibe ngumbhalo ocacileyo. Ngaba unako zama ukugcina ubeko lwantlandlolo usebenzisa ukhetho -umdlalo ngomyalelo, kodwa sinokuzama ngaphandle kwawo. Kwisiphelo sendlela (Ctrl + Alt + T) umthetho oza kuwusebenzisa uya kuba koku kulandelayo:

pdftotext guqula i-pdf kwisicatshulwa esicacileyo

pdftotext -layout pdf-entrada.pdf pdf-salida.txt

Kumyalelo odlulileyo kuya kufuneka sithathe indawo pdf-igalelo.pdf enegama lefayile yePDF esinomdla wokuguqula, kunye pdf-imveliso.txt ngegama lefayile yeTXT apho sifuna ukugcina isicatshulwa sefayile yePDF yokufaka. Ukuba asichazi nayiphi na ifayile yokubhaliweyo, i-pdftotext izakuthi igama lefayile ngokuzenzekelayo libe negama elifanayo nefayile yoqobo yePDF kodwa ngolwandiso lwe-txt. Enye into enomdla ukongeza kumyalelo iya kuba ziindlela eziphambi kwamagama efayile ukuba kukho imfuneko (~ / Amaxwebhu / pdf-input.pdf).

Guqula kuphela uluhlu lwamaphepha e-PDF kwisicatshulwa

Ukuba asinamdla wokuguqula yonke ifayile yePDF, kwaye sifuna unciphise uluhlu lwamaphepha e-PDF ukuze uguqulele kwisicatshulwa kuza kubakhona i Sebenzisa -f ukhetho (iphepha lokuqala ukuguqula) kunye -l (iphepha lokugqibela ukuguqula) ilandelwe kukhetho ngalunye olunenombolo yephepha. Umyalelo wokusebenzisa unokuba yinto elandelayo:

pdftotext -layout -f P -l U pdf-entrada.pdf

gcina kwifomathi yokubhaliweyo inani elinikiweyo lamaphepha e-pdf

Kumyalelo odlulileyo kuya kufuneka endaweni yoonobumba P no-U ngamanani ephepha lokuqala nelokugqibela ukukhupha. Igama le pdf-igalelo.pdf Kuya kufuneka siyitshintshe kwaye siyinike igama lefayile yePDF esifuna ukusebenza ngayo.

Sebenzisa isiphelo somgca

Oku siya kuba nakho ukukhankanya usebenzisa -eol elandelwa yimac, dos okanye unix. Lo myalelo ulandelayo uza kongeza ukuphela komgca we-unix:

pdftotext -layout -eol unix pdf-entrada.pdf

Uncedo

ukuba jonga iindlela ezikhoyo, Sebenzisa iphepha lomntu:

indoda pdftotext

man pdftotext

Unako kwakhona dibana noncedo ngomyalelo:

Uncedo lokuyalela pdftotext

pdftotext --help

Guqula iifayile zePDF kwifolda usebenzisa iBash FOR loop

Kwimeko apho sifuna ukuguqula zonke iifayile zePDF kwifolda ukuba zibhale iifayile, pdftotext ayikuxhasi ukuguqulwa kwebhetshi ukusuka kwiPDF ukuya kwisicatshulwa. Oku Siza kuba nakho ukuyenza sisebenzisa i-Bash FOR loop Kwisiphelo (Ctrl + Alt + T):

for file in *.pdf; do pdftotext -layout "$file"; done

ukuba ulwazi oluthe kratya malunga pdftotext, ungaqhagamshelana ne iwebhusayithi yeprojekthi. Kwimeko apho ukhetha ukungachwethezi imiyalelo kwisiphelo sendlela, unakho Sebenzisa i inkonzo online ukufumana iziphumo ezifanayo.


Izimvo, shiya eyakho

Shiya uluvo lwakho

Idilesi yakho ye email aziyi kupapashwa. ezidingekayo ziphawulwe *

*

*

  1. Uxanduva lwedatha: UMiguel Ángel Gatón
  2. Injongo yedatha: Ulawulo lwe-SPAM, ulawulo lwezimvo.
  3. Umthetho: Imvume yakho
  4. Unxibelelwano lwedatha: Idatha ayizukuhanjiswa kubantu besithathu ngaphandle koxanduva lomthetho.
  5. Ukugcinwa kweenkcukacha
  6. Amalungelo: Ngalo naliphi na ixesha unganciphisa, uphinde uphinde ucime ulwazi lwakho.

  1.   UMoypher Nigthkrelin sitsho

    ewe iyasebenza, kodwa ngamanye amaxesha kuya kufuneka ndenze i-OCR okanye ndisebenzise iOfisi yeOfisi yeLibre.

    Ukongeza baninzi abahleli be-pdf. kwaye kuyacaca ukuba oku akwenzeki kwimifanekiso ebhaliweyo, ke andiyiboni iyasebenza.

    Kwaye iOfisi yeOfisi yokuHamba inomdla kwaye iyasebenza.