A cikin labarin na gaba zamuyi duba pdftotext. Wannan hanyar amfani da layin umarni ce mai bude wacce zata bamu damar maida fayilolin PDF zuwa fayilolin rubutu sarari. Asali abin da yake yi shine cire bayanan rubutu daga fayilolin PDF. Wannan software kyauta ce kuma an haɗa ta tsohuwa a yawancin rarrabawar Gnu / Linux.
A cikin layuka masu zuwa zamu ga kayan aiki don tashar, amma don wannan manufar cire rubutu daga fayilolin PDF Hakanan zaka iya amfani da kayan aiki mai zane kamar Caliber. Yana da kyau a lura cewa duka kayan aikin zane ne da wanda zamu iya amfani dashi a cikin tashar, ba za su iya cire rubutun ba idan an yi PDF ɗin da hotuna (hotuna, hotunan hotunan hoto, da sauransu.).
A kan yawancin rarrabawar Gnu / Linux, pdftotext yana cikin ɓangare na kayan aikin poppler-utils. Wannan kayan aikin shine amfanin layin umarni cewa maida fayilolin PDF zuwa rubutu bayyananne. A ciki zamu sami zaɓuɓɓuka da yawa waɗanda muke da su, gami da ikon tantance kewayon shafuka don canzawa, da ikon kiyaye asalin yanayin rubutun yadda ya kamata, saita ƙarshen layi, har ma da aiki tare da fayilolin PDF mai kariya .
Sanya pdftotext akan Ubuntu
Don shigar da wannan kayan aikin akan tsarin Ubuntu, idan baku riga kun sanya shi ba, kawai kuna buɗe tashar (Ctrl + Alt + T) kuma rubuta umarnin mai zuwa a ciki zuwa shigar da kayan masarufi:
sudo apt install poppler-utils
Yadda ake amfani da pdftotext
Maida fayil na PDF zuwa rubutu
Da zarar mun sanya kunshin a kan tsarin aikin mu, za mu iya canza fayil ɗin PDF zuwa rubutu bayyananne. Iya yi ƙoƙarin kiyaye ƙirar asali ta amfani da zaɓi -lashiya tare da umarnin, amma kuma zamu iya gwadawa ba tare da shi ba. A cikin m (Ctrl + Alt + T) umarnin don amfani zai zama mai zuwa:
pdftotext -layout pdf-entrada.pdf pdf-salida.txt
A cikin umarnin da ya gabata dole ne mu maye gurbin pdf-shigar.pdf tare da sunan fayil din PDF wanda muke sha'awar sauyawa, kuma pdf-fitarwa.txt ta sunan fayil ɗin Sako wanda muke so mu adana rubutun shigar da fayil ɗin PDF. Idan ba mu tantance kowane fayil din fitarwa ba, pdftotext zai sanya sunan fayil din kai tsaye tare da suna iri daya da na asalin fayil na PDF amma tare da karin txt. Wani abin da zai iya zama mai ban sha'awa don ƙarawa zuwa umarnin zai zama hanyoyi kafin sunayen fayiloli idan ya cancanta (~ / Takardu / pdf-shigar.pdf).
Mayar da kewayon shafukan PDF kawai zuwa rubutu
Idan ba mu da sha'awar canza duk fayil ɗin PDF, kuma muna so rage takaddun shafukan PDF don juyawa zuwa rubutu za a yi amfani -f zaɓi (shafin farko don canzawa) y -l (shafin karshe don canzawa) ana biye da kowane zaɓi tare da lambar shafi. Umurnin don amfani zai zama wani abu kamar haka:
pdftotext -layout -f P -l U pdf-entrada.pdf
A cikin umarnin da ya gabata zaku sami maye gurbin haruffa P da U tare da lambobin shafin farko da na ƙarshe cirewa. Sunan pdf-shigar.pdf Hakanan dole ne mu canza shi kuma mu ba shi sunan fayil ɗin PDF wanda muke son aiki da shi.
Yi amfani da haruffan layin
Wannan zamu iya tantancewa amfani da -eol wanda mac, dos ko unix ke bi. Umurnin mai zuwa zai ƙara ƙarshen layi na unix:
pdftotext -layout -eol unix pdf-entrada.pdf
Taimako
para duba zaɓuka masu samuwa, gudanar da shafin mutum:
man pdftotext
Zaka kuma iya nemi shawarar taimako tare da umarnin:
pdftotext --help
Maida fayilolin PDF daga babban fayil ta amfani da madauki Bash FOR
Idan muna son canza duk fayilolin PDF a cikin babban fayil zuwa fayilolin rubutu, pdftotext ba ya goyan bayan sauya tsari daga PDF zuwa rubutu. Wannan za mu iya yin sa ta amfani da madauki Bash FOR a cikin m (Ctrl + Alt + T):
for file in *.pdf; do pdftotext -layout "$file"; done
para ƙarin bayani game da pdftotext, zaka iya tuntuɓar aikin yanar gizo. Idan kuka fi so kada ku buga umarni a cikin tashar, zaku iya amfani da sabis na kan layi don samun sakamako iri daya.
ee, da kyau yana aiki, amma wani lokacin sai nayi OCR ko amfani da Libre Office Draw.
Bugu da kari akwai editoci da yawa na pdf. kuma ga alama wannan baya faruwa ga rubutu hotuna hotuna, don haka ban ga yana da amfani ba.
Kuma Libre Office Draw yana da ilhama da amfani.