Nidhi Gupta
3 min readOct 2, 2024

--

Explode vs Explode_outer in Databricks

Working with JSON data presents a consistent challenge for data engineers. I recently had the opportunity to explore the use cases for explode and explode_outer while addressing a new requirement. In this article, we will delve into the scenarios where explode and explode_outer are most effectively utilized.

Photo by Ferenc Almasi on Unsplash

Explode:

  • It only flattens non-null values from arrays or maps.
  • If an array or map is null or empty, the corresponding row is dropped from the output.

Example1:

from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
spark = SparkSession.builder.getOrCreate()
data = [(1, [1, 2, 3]), (2, None), (3, [])]
df = spark.createDataFrame(data, ["id", "values"])
df.select("id", explode("values")).show()

Result1:

Example2:

from pyspark.sql.types import StructType, StructField, ArrayType, StringType, IntegerType, MapType
from pyspark.sql import SparkSession
data = [
(1, [{"name": "Alice", "items": ["book", "pen"]}, {"name": "Bob", "items": []}]),
(2, None),
(3, [{"name": "Charlie", "items": None}]),
(4, [{"name": "David", "items": ["pencil"]}, {"name": "Eve", "items": None}])
]
schema = StructType([
StructField("id", IntegerType(), True),
StructField("info"…

--

--

Nidhi Gupta

Azure Data Engineer 👨‍💻.Heading towards cloud technologies expertise✌️.