I-Apache Spark, isakhelo esikhulu sokuhlalutya idatha sihlaziywa kwinguqulelo yaso engu-3.0

I-Apache Spark sisakhelo ikhomputha evulekileyo yomthombo ebonelela ngonxibelelwano lwenkqubo yeqela kugqityiwe ngokuDityaniswa kweDatha ngokuDibeneyo kunye nokunyamezelana kweziphoso, apho iprojekti yeSpark projekthi yanikezelwa kwiApache Software Foundation ejongene nolondolozo lwayo.

Apache Spark ingaqwalaselwa njongo ngokubanzi, inkqubo yeklasta ejolise kwisantya.

Nika ii-API kwiJava, iScala, iPython kunye neR, kunye ikwabonelela ngenjini elungileyo exhasa ukwenziwa kwegrafu ngokubanzi.

Kwakhona ixhasa iseti ebanzi netyebileyo yezixhobo ezikumgangatho ophezulu phakathi kwazo endikwaziyo zibandakanya iSpark SQL (yolwakhiwo lwedatha esekwe kwi-SQL), i-MLlib yokuphumeza ukufunda ngomatshini, i-GraphX ​​yokuqhutywa kwegrafu, kunye nokusasazeka kweSpark.

I-Spark SQL yimodyuli Apache Spark Ukusebenza ngedatha emiselweyo kwaye ithandwa kakhulu kwizicelo zeSpark. Ngokuka-Databricks, inkampani esekwe ngabadali be-Apache Spark, nkqu nabaphuhlisi bePython kunye neScala benza uninzi lomsebenzi wabo ngenjini yeSpark SQL.

I-Spark namhlanje sisakhelo se-facto sokulungiswa kwedatha enkulu, isayensi yedatha, ukufunda ngomatshini, kunye nohlalutyo lwedatha.

Malunga neApache Spark 3.0

Okwangoku Isakhelo sikwinguqulelo yayo engu-3.0 naphakathi kwezona zinto zibalulekileyo zibalulekileyo, kufanele kuqatshelwe ukuba I-Spark 3.0 iphindwe kabini ngokukhawuleza kunenguqulo yangaphambili ngokuxhomekeka kwi-TPC-DS, phakathi kwabanye.

Olu nyuso lwentsebenzo lufezekisiwe ngophuculo njengokuqhuba imibuzo eguqukayo, Ukuthenwa kwezahlulelo ezinamandla kunye nolunye ulungiselelo. Ukuthotyelwa komgangatho we-ANSI SQL kuphuculwe.

I-Spark 3.0 yinguqulelo enkulu enamatikiti angaphezulu kwama-3400 asonjululwe, kodwa ngaphakathi kweenguqulelo ezinkulu, kuphela Bayazikhawulela kwezona mpawu ziphambili ze-SQL kunye nePython, phakathi kwabanye.

I-Apache Spark 3.0 yomeleza esi sikhundla ngokuphucula kakhulu inkxaso ye-SQL kunye nePython, Ezona lwimi zimbini zisetyenziswa kakhulu ngeSpark namhlanje nangokubonelela ngokwenziwa kwezinto kuwo onke amanqanaba.

I-PySpark, i-Spark API yePython, inokukhuphela ngaphezulu kwesigidi se-5 ngenyanga kwi-PyPI, isalathiso sephakheji yePython. Uninzi lwabaphuhlisi bePython Basebenzisa i-API kuhlalutyo lwedatha, nangona inikwe umda kusetyenziso olunye lwendlela

IPython, ke, indawo ephambili yophuhliso lweSpark 3.0. Uphuhliso lwe-API kwi-Apache Spark lukhawulezisiwe ukwenza iinzululwazi zedatha zivelise ngakumbi xa zisebenza ngedatha enkulu kwiindawo ezisasaziweyo.

IiKoalas zishenxisa isidingo sokwenza imisebenzi emininzi (umz. Inkxaso yemizobo) kwiPySpark, yokusebenza okungcono kwiqela.

Ukuza kuthi ga ngoku, sinokuthi indima kaSpark ihlala imiselwe kule ye-ETL (Khipha uMthwalo woTshintsho).

Oku kukhokelela kuphuculo olubalulekileyo lwe-API, kubandakanya uhlobo lwePython hints kunye nepasas eyongezelelweyo ye-UDFs (imisebenzi echazwe ngumsebenzisi).

I-Spark 3.0 ibonelela ngcono ngePython yokuphatha iimpazamo, kunye neefowuni kwimisebenzi echazwe ngumsebenzisi R ukuya kuthi ga kuma-40 ngokukhawuleza.

Kufuneka kuqatshelwe ukuba kwi-Spark 3.0, I-46% yalo lonke ulungiso yayikukusebenza kwe-SQL, ephucule ukusebenza kunye nokuhambelana kwe-ANSI.

Xa sele nditshilo, ezona mpawu zintathu zibaluleke kakhulu kwiinjini zeSpark SQL kukuphunyezwa kwemibuzo.

Iindlela zokusebenzisa Imibuzo ngokubanzi ijolise kwimibuzo emileyo yokusebenza.

Ngenxa yokwahlulwa kokugcinwa kunye nokulungiswa kwi-Spark, ukufika kwedatha akunakulindeleka. Ngezi zizathu, ukwenziwa kwemibuzo eguqukayo kuya kubaluleke ngakumbi kwiSpark kunokuba kunjalo kwiinkqubo zesiko.

Zininzi ezinye izinto onokuzijonga kumanqaku okukhululwa. Iimpawu ezibandakanya imithombo yedatha, i-ecosystems, ukubeka esweni, ukulungisa ingxaki, kunye nokunye.

Ungajonga inqaku lokukhupha ngokuya kule khonkco ilandelayo.

Umthombo: https://spark.apache.org/


Shiya uluvo lwakho

Idilesi yakho ye email aziyi kupapashwa. ezidingekayo ziphawulwe *

*

*

  1. Uxanduva lwedatha: UMiguel Ángel Gatón
  2. Injongo yedatha: Ulawulo lwe-SPAM, ulawulo lwezimvo.
  3. Umthetho: Imvume yakho
  4. Unxibelelwano lwedatha: Idatha ayizukuhanjiswa kubantu besithathu ngaphandle koxanduva lomthetho.
  5. Ukugcinwa kweenkcukacha
  6. Amalungelo: Ngalo naliphi na ixesha unganciphisa, uphinde uphinde ucime ulwazi lwakho.