Pdftotext, juya PDF zuwa rubutu daga m

game da pdftotext

A cikin labarin na gaba zamuyi duba pdftotext. Wannan hanyar amfani da layin umarni ce mai bude wacce zata bamu damar maida fayilolin PDF zuwa fayilolin rubutu sarari. Asali abin da yake yi shine cire bayanan rubutu daga fayilolin PDF. Wannan software kyauta ce kuma an haɗa ta tsohuwa a yawancin rarrabawar Gnu / Linux.

A cikin layuka masu zuwa zamu ga kayan aiki don tashar, amma don wannan manufar cire rubutu daga fayilolin PDF Hakanan zaka iya amfani da kayan aiki mai zane kamar Caliber. Yana da kyau a lura cewa duka kayan aikin zane ne da wanda zamu iya amfani dashi a cikin tashar, ba za su iya cire rubutun ba idan an yi PDF ɗin da hotuna (hotuna, hotunan hotunan hoto, da sauransu.).

A kan yawancin rarrabawar Gnu / Linux, pdftotext yana cikin ɓangare na kayan aikin poppler-utils. Wannan kayan aikin shine amfanin layin umarni cewa maida fayilolin PDF zuwa rubutu bayyananne. A ciki zamu sami zaɓuɓɓuka da yawa waɗanda muke da su, gami da ikon tantance kewayon shafuka don canzawa, da ikon kiyaye asalin yanayin rubutun yadda ya kamata, saita ƙarshen layi, har ma da aiki tare da fayilolin PDF mai kariya .

game da cire sanannun kalmar sirri daga pdf
Labari mai dangantaka:
Cire sananniyar kalmar sirri daga fayil ɗin PDF a Ubutu

Sanya pdftotext akan Ubuntu

Don shigar da wannan kayan aikin akan tsarin Ubuntu, idan baku riga kun sanya shi ba, kawai kuna buɗe tashar (Ctrl + Alt + T) kuma rubuta umarnin mai zuwa a ciki zuwa shigar da kayan masarufi:

girka kayan poppler

sudo apt install poppler-utils

Yadda ake amfani da pdftotext

Maida fayil na PDF zuwa rubutu

Da zarar mun sanya kunshin a kan tsarin aikin mu, za mu iya canza fayil ɗin PDF zuwa rubutu bayyananne. Iya yi ƙoƙarin kiyaye ƙirar asali ta amfani da zaɓi -lashiya tare da umarnin, amma kuma zamu iya gwadawa ba tare da shi ba. A cikin m (Ctrl + Alt + T) umarnin don amfani zai zama mai zuwa:

pdftotext maida pdf zuwa bayyananne rubutu

pdftotext -layout pdf-entrada.pdf pdf-salida.txt

A cikin umarnin da ya gabata dole ne mu maye gurbin pdf-shigar.pdf tare da sunan fayil din PDF wanda muke sha'awar sauyawa, kuma pdf-fitarwa.txt ta sunan fayil ɗin Sako wanda muke so mu adana rubutun shigar da fayil ɗin PDF. Idan ba mu tantance kowane fayil din fitarwa ba, pdftotext zai sanya sunan fayil din kai tsaye tare da suna iri daya da na asalin fayil na PDF amma tare da karin txt. Wani abin da zai iya zama mai ban sha'awa don ƙarawa zuwa umarnin zai zama hanyoyi kafin sunayen fayiloli idan ya cancanta (~ / Takardu / pdf-shigar.pdf).

Mayar da kewayon shafukan PDF kawai zuwa rubutu

Idan ba mu da sha'awar canza duk fayil ɗin PDF, kuma muna so rage takaddun shafukan PDF don juyawa zuwa rubutu za a yi amfani -f zaɓi (shafin farko don canzawa) y -l (shafin karshe don canzawa) ana biye da kowane zaɓi tare da lambar shafi. Umurnin don amfani zai zama wani abu kamar haka:

pdftotext -layout -f P -l U pdf-entrada.pdf

adana a tsarin rubutu lambar da aka bayar na pdf

A cikin umarnin da ya gabata zaku sami maye gurbin haruffa P da U tare da lambobin shafin farko da na ƙarshe cirewa. Sunan pdf-shigar.pdf Hakanan dole ne mu canza shi kuma mu ba shi sunan fayil ɗin PDF wanda muke son aiki da shi.

Yi amfani da haruffan layin

Wannan zamu iya tantancewa amfani da -eol wanda mac, dos ko unix ke bi. Umurnin mai zuwa zai ƙara ƙarshen layi na unix:

pdftotext -layout -eol unix pdf-entrada.pdf

Taimako

para duba zaɓuka masu samuwa, gudanar da shafin mutum:

mutum pdftotext

man pdftotext

Zaka kuma iya nemi shawarar taimako tare da umarnin:

taimaka umarni pdftotext

pdftotext --help

Maida fayilolin PDF daga babban fayil ta amfani da madauki Bash FOR

Idan muna son canza duk fayilolin PDF a cikin babban fayil zuwa fayilolin rubutu, pdftotext ba ya goyan bayan sauya tsari daga PDF zuwa rubutu. Wannan za mu iya yin sa ta amfani da madauki Bash FOR a cikin m (Ctrl + Alt + T):

for file in *.pdf; do pdftotext -layout "$file"; done

para ƙarin bayani game da pdftotext, zaka iya tuntuɓar aikin yanar gizo. Idan kuka fi so kada ku buga umarni a cikin tashar, zaku iya amfani da sabis na kan layi don samun sakamako iri daya.


Bar tsokaci

Your email address ba za a buga. Bukata filayen suna alama da *

*

*

  1. Wanda ke da alhakin bayanan: Miguel Ángel Gatón
  2. Manufar bayanan: Sarrafa SPAM, sarrafa sharhi.
  3. Halacci: Yarda da yarda
  4. Sadarwar bayanan: Ba za a sanar da wasu bayanan ga wasu kamfanoni ba sai ta hanyar wajibcin doka.
  5. Ajiye bayanai: Bayanin yanar gizo wanda Occentus Networks (EU) suka dauki nauyi
  6. Hakkoki: A kowane lokaci zaka iyakance, dawo da share bayanan ka.

  1.   Moypher Nightkrelin m

    ee, da kyau yana aiki, amma wani lokacin sai nayi OCR ko amfani da Libre Office Draw.

    Bugu da kari akwai editoci da yawa na pdf. kuma ga alama wannan baya faruwa ga rubutu hotuna hotuna, don haka ban ga yana da amfani ba.

    Kuma Libre Office Draw yana da ilhama da amfani.