Nidhi Gupta
2 min readOct 2, 2023

--

POWER OF APACHE SPARK API

Spark is a distributed engine that provides support for various languages such as Java, Python, Scala, and Sql. It provides flexibility for a programmer to write code in any supported language.

SPARK API

In this article, we will discuss the API support provided by Apache Spark which provides great ease and flexibility in interacting with data.

Spark RDD(Resilient distributed dataset):- This API provide the following support

(i) An RDD is an dataset and fundamental data structure.

(ii) No row , column or schema enforcement.

(iii) Resilient support fault tolerance in an api.

(iv) RDD partition can be recreated and reprocessed anywhere in the cluster.

Catalyst Optimizer(Spark SQL engine):- This API provide the following support

(i) Analysis

(ii) Logical Optimization

(iii) Physical Planning

(iv) Code Generation

Note: Please follow the below article to read more details on Catalyst Optimizer.

--

--

Nidhi Gupta

Azure Data Engineer 👨‍💻.Heading towards cloud technologies expertise✌️.