3 min readJun 16, 2023
Azure Databricks
Company founded by the creators of Apache Spark. Databricks makes use of Apache Spark to provide a Unified Analytics platform.
Why do we need Azure Databricks?
- To make use of Apache Spark we need to provision the machines install the spark, and the necessary libraries and maintain the scaling and availability of the machines.
- With Databricks, the entire environment can be provisioned with just a few clicks.
Three main components
- Databricks tools, services, and optimization.
- Distributed computation
- DBFS (Files)
Databricks Infrastructure
Azure Databricks workspace is a single cluster with multiple nodes. The cluster will have the spark engine and other components installed.
The cluster contains two types of nodes:
- Worker/ Slave Nodes
Node is responsible for actually performing the underlying tasks.
2. Driver / Master Nodes
(i) Entry point to the node or the Pyspark application.