Nidhi Gupta
3 min readJun 16, 2023

--

Azure Databricks

Company founded by the creators of Apache Spark. Databricks makes use of Apache Spark to provide a Unified Analytics platform.

Why do we need Azure Databricks?

  1. To make use of Apache Spark we need to provision the machines install the spark, and the necessary libraries and maintain the scaling and availability of the machines.
  2. With Databricks, the entire environment can be provisioned with just a few clicks.

Three main components

  1. Databricks tools, services, and optimization.
  2. Distributed computation
  3. DBFS (Files)

Databricks Infrastructure

Azure Databricks workspace is a single cluster with multiple nodes. The cluster will have the spark engine and other components installed.

The cluster contains two types of nodes:

  1. Worker/ Slave Nodes

Node is responsible for actually performing the underlying tasks.

2. Driver / Master Nodes

(i) Entry point to the node or the Pyspark application.

--

--

Nidhi Gupta
Nidhi Gupta

Written by Nidhi Gupta

Azure Data Engineer 👨‍💻.Heading towards cloud technologies expertise✌️.

Responses (1)