Apache Spark, babban tsarin nazarin bayanai an sabunta shi zuwa sigar 3.0

Apache Spark tsari ne sourceididdigar tarin bayanan tushe wanda ke ba da hanyar haɗi don shirye-shiryen tari cikakke tare da Bayanin Bayanan Bayanai da Haƙuri, wanda aka ba da lambar aikin Spark zuwa Asusun Software na Apache wanda ke da alhakin kiyaye shi.

Apache Spark ana iya ɗauka a matsayin babban manufa, tsarin ƙididdigar gungu-gungu mai saurin-sauri.

Samar da APIs a Java, Scala, Python da R, da ƙari kuma yana samar da ingantaccen injin wannan yana goyan bayan aiwatar da zane a gaba ɗaya.

Har ila yau yana goyan bayan ɗumbin wadatattun kayan aiki tsakanin wanda na sani hada da Spark SQL (don tsarin SQL mai tsari wanda aka tsara shi), MLlib don aiwatar da ilmantarwa na na'ura, GraphX ​​don aikin zane, da Spark Streaming.

Spark SQL shine samfurin Apache Spark don aiki tare da bayanan da aka tsara kuma sananne ne a cikin aikace-aikacen Spark. A cewar Databricks, kamfanin da masu kirkirar Apache Spark suka kafa, hatta Python da Scala masu ci gaba suna aikinsu da injin Spark SQL.

Spark a yau shine ainihin tsarin tsarin sarrafa bayanai, kimiyyar bayanai, koyon na'ura, da kuma nazarin bayanai.

Game da Apache Spark 3.0

A halin yanzu tsarin yana cikin sigar 3.0 kuma daga cikin mahimman sabbin sifofi, ya kamata a lura da hakan Spark 3.0 ya ninka sauri fiye da sigar da ta gabata ta hanyar dogaro da TPC-DS, da sauransu.

Wannan haɓaka aikin ya samu ta hanyar ingantawa kamar yin tambayoyin daidaitawa, pruning na partitions mai motsi da sauran abubuwan ingantawa. An kuma inganta bin ka'idar ANSI SQL.

Spark 3.0 babban juzu'i ne wanda ke da tikiti fiye da 3400 da aka warware, amma a cikin manyan canje-canje, kawai Sun iyakance kansu ga manyan sababbin abubuwan SQL da Python, da sauransu.

Apache Spark 3.0 yana ƙarfafa wannan matsayi ta haɓaka ingantaccen tallafi ga SQL da Python, harsuna biyu da aka fi amfani dasu tare da Spark a yau kuma ta hanyar samar da abubuwan haɓakawa da yawa a duk matakan.

PySpark, Spark API na Python, yana da saukarwa sama da miliyan 5 a kowane wata akan PyPI, jerin kunshin Python. Yawancin masu haɓaka Python Suna amfani da API don nazarin bayanai, kodayake yana iyakance ga sarrafa kumburi ɗaya.

Python ya kasance, sabili da haka, babban yanki na ci gaba don Spark 3.0. Ci gaban API akan Apache Spark an hanzarta don sa masana kimiyyar bayanai su kasance masu haɓaka yayin aiki tare da manyan bayanai a cikin yanayin da aka rarraba.

Koalas yana kawar da buƙatar ƙirƙirar ayyuka da yawa (misali tallafi na zane) a cikin PySpark, don kyakkyawan aiki a gungu.

Zuwa yanzu, zamu iya cewa rawar Spark galibi ana iyakance ta ETL ne (Extract Transform Load).

Wannan yana haifar da ingantaccen ci gaba musamman ga APIs, gami da alamun Python da ƙarin Uands pandas (ayyuka masu ma'ana na mai amfani).

Spark 3.0 yana ba da mafi kyawun kuskuren Python, kuma kira zuwa ayyukan mai ƙayyade masu amfani na R suna da sauri har sau 40.

Ya kamata kuma a sani cewa a cikin Spark 3.0, 46% na dukkan gyara sun kasance don aikin SQL, wanda ya inganta ingantaccen aiki da daidaituwa na ANSI.

Wannan ya ce, sababbin mahimman abubuwa guda uku a cikin injin Spark SQL sune aiwatar da tambayoyin daidaitawa.

Hanyoyin ingantawa Tambayoyi gabaɗaya suna mai da hankali kan ingantawar tambayar tambaya.

Saboda rabuwar ajiya da aiki a cikin Spark, zuwan bayanai na iya zama mara tabbas. Saboda wadannan dalilai, aiwatar da tambayar neman aiki ya zama mafi mahimmanci ga Spark fiye da yadda yake da tsarin gargajiya.

Akwai wasu fasalolin da yawa waɗanda zaku iya bincika a cikin bayanan sakin. Fasali wanda ya kunshi bayanan bayanai, abubuwanda suke rayuwa, sa ido, gyara, da sauransu.

Kuna iya duba bayanin sakin ta hanyar zuwa mahaɗin mai zuwa.

Source: https://spark.apache.org/


Bar tsokaci

Your email address ba za a buga. Bukata filayen suna alama da *

*

*

  1. Wanda ke da alhakin bayanan: Miguel Ángel Gatón
  2. Manufar bayanan: Sarrafa SPAM, sarrafa sharhi.
  3. Halacci: Yarda da yarda
  4. Sadarwar bayanan: Ba za a sanar da wasu bayanan ga wasu kamfanoni ba sai ta hanyar wajibcin doka.
  5. Ajiye bayanai: Bayanin yanar gizo wanda Occentus Networks (EU) suka dauki nauyi
  6. Hakkoki: A kowane lokaci zaka iyakance, dawo da share bayanan ka.