I-Apache Spark, uhlaka olukhulu lokuhlaziywa kwedatha luvuselelwa kunguqulo yalo engu-3.0

I-Apache Spark uhlaka ikhompiyutha yeqoqo lomthombo ovulekile enikezela isikhombimsebenzisi sohlelo lweqoqo igcwaliswe nge-Implicit Data Parallelism and Fault Tolerance, okunikelwe nge-Spark codebase project ku-Apache Software Foundation ebhekele ukugcinwa kwayo.

I-Apache Spark kungathathwa njengohlelo olujwayelekile, olusebenza ngeqoqo elisebenza ngesivinini.

Nikeza ngama-API kuJava, iScala, iPython neR, kanye futhi inikeza injini elungiselelwe esekela ukwenziwa kwamagrafu kukonke.

Futhi isekela isethi ebanzi necebile yamathuluzi asezingeni eliphakeme phakathi kwalokho engikwaziyo faka iSpark SQL )

I-Spark SQL iyona module I-Apache Spark ngokusebenza nedatha ehlelekile futhi ithandwa kakhulu kuzinhlelo zokusebenza zeSpark. Ngokusho kukaDatabricks, inkampani esungulwe ngabadali be-Apache Spark, ngisho nabathuthukisi bePython neScala benza umsebenzi wabo omkhulu ngenjini yeSpark SQL.

I-Spark namuhla uhlaka lwe-de facto lokucutshungulwa kwedatha enkulu, isayensi yedatha, ukufunda ngomshini, kanye nokuhlaziywa kwedatha.

Mayelana ne-Apache Spark 3.0

Okwamanje uhlaka lusenguqulweni yalo engu-3.0 futhi phakathi kwezici ezintsha ezibaluleke kakhulu, kufanele kuqashelwe ukuthi I-Spark 3.0 ishesha ngokuphindwe kabili kunenguqulo yangaphambilini ngokuthembela ku-TPC-DS, phakathi kokunye.

Lokhu kwenyuka kokusebenza kufinyelelwe ngokusebenzisa ukuthuthuka njengokusebenzisa imibuzo eguquguqukayo, ukuthenwa kokwahlukaniswa okunamandla nokunye okwenziwe lula. Ukuhambisana nezinga le-ANSI SQL nakho kuthuthukisiwe.

ISpark 3.0 inguqulo enkulu enamathikithi asombululwe angaphezu kwama-3400, kepha ngaphakathi kwezinguquko ezinkulu, kuphela Bazilinganisela kuzici ezintsha eziyinhloko ze-SQL nePython, phakathi kwabanye.

I-Apache Spark 3.0 iqinisa lesi sikhundla ngokuthuthukisa kakhulu ukwesekwa kwe-SQL nePython, izilimi ezimbili ezisetshenziswa kakhulu ngeSpark namuhla nangokuhlinzeka ngokusebenziseka okuningi kuwo wonke amazinga.

I-PySpark, i-Spark API yePython, inokulandwa okungaphezulu kwezigidi ezi-5 nyangazonke ku-PyPI, inkomba yephakheji yePython. Abathuthukisi abaningi bePython Basebenzisa i-API ekuhlaziyweni kwedatha, noma kukhawulelwe ekucubunguleni i-node eyodwa.

IPython, ngakho-ke, indawo esemqoka yentuthuko yeSpark 3.0. Ukuthuthukiswa kwe-API ku-Apache Spark kusheshisiwe ukwenza ososayensi bedatha bakhiqize ngokwengeziwe lapho besebenza nedatha enkulu ezindaweni ezisatshalalisiwe.

AmaKoalas aqeda isidingo sokwenza imisebenzi eminingi (isb. ukwesekwa kwehluzo) ku-PySpark, ukuze kusebenze kangcono kuqoqo.

Kuze kube manje, singasho ukuthi indima kaSpark ivame ukukhawulelwa kuleyo ye-ETL (Extract Transform Load).

Lokhu kubangela ukuthuthuka okubalulekile kuma-API, kufaka phakathi izinkomba zohlobo lwePython nama-pandas UDFs (imisebenzi echazwe ngumsebenzisi).

I-Spark 3.0 inikezela kangcono ukuphatha iphutha le-Python, namakholi emisebenzi echazwe ngumsebenzisi R afinyelela ezikhathini ezingama-40 ngokushesha.

Kumele kuqashelwe nokuthi ku-Spark 3.0, U-46% wakho konke ukulungiswa kwakwenzelwe ukusebenza kwe-SQL, eye yathuthukisa kokubili ukusebenza nokuhambisana kwe-ANSI.

Lokho kusho, izici ezintathu ezibaluleke kakhulu kunjini yeSpark SQL ukwenziwa kwemibuzo eguqukayo.

Izindlela zokwenza kahle Imibuzo ngokuvamile igxila ekusebenzeni kombuzo we-tuli.

Ngenxa yokwehlukaniswa kwesitoreji nokucutshungulwa eSpark, ukufika kwedatha kungalindeleka. Ngalezi zizathu, ukwenziwa kombuzo okuguqukayo kubaluleke kakhulu kuSpark kunakwizinhlelo zendabuko.

Kunezinye izinto eziningi ongazihlola kumanothi wokukhishwa. Izici ezihlanganisa imithombo yedatha, imvelo, ukuqapha, ukulungisa iphutha, nokuningi.

Ungahlola inothi lokukhishwa ngokuya kusixhumanisi esilandelayo.

Umthombo: https://spark.apache.org/


Shiya umbono wakho

Ikheli lakho le ngeke ishicilelwe. Ezidingekayo ibhalwe nge *

*

*

  1. Ubhekele imininingwane: Miguel Ángel Gatón
  2. Inhloso yedatha: Lawula Ugaxekile, ukuphathwa kwamazwana.
  3. Ukusemthethweni: Imvume yakho
  4. Ukuxhumana kwemininingwane: Imininingwane ngeke idluliselwe kubantu besithathu ngaphandle kwesibopho esisemthethweni.
  5. Isitoreji sedatha: Idatabase ebanjwe yi-Occentus Networks (EU)
  6. Amalungelo: Nganoma yisiphi isikhathi ungakhawulela, uthole futhi ususe imininingwane yakho.