Nidhi Gupta
2 min readJul 26, 2023

--

Working with Dates and Time in PySpark

Recently working on my current project faced a scenario where we needed to convert a string datatype column with the date value to a date or datetime datatype with the same value.

In this article, we will learn how to convert different formats of date/datetime values using Pyspark.

Functions to be imported

from pyspark.sql.functions import to_timestamp

Case1:- 2019–12–25 13:30:00

df = spark.createDataFrame([(‘2019–12–25 13:30:00’,)],[‘date’])

df.show()

+-------------------+
| date|
+-------------------+
|2019-12-25 13:30:00|
+-------------------+

df.types

[('date', 'string')]

d1= df.withColumn(“date”, to_timestamp(“date”, ‘yyyy-MM-dd HH:mm:ss’))

d1.dtypes

[('date', 'timestamp')]

d1.show()

+-------------------+
| date|
+-------------------+
|2019-12-25 13:30:00|
+-------------------+

Case2: 25/Dec/2019 13:30:00

df = spark.createDataFrame([(‘25/Dec/2019 13:30:00’,)],[‘date’])

df.show()

+--------------------+
|…

--

--

Nidhi Gupta

Azure Data Engineer 👨‍💻.Heading towards cloud technologies expertise✌️.