--
Working with Dates and Time in PySpark
Recently working on my current project faced a scenario where we need to convert a string datatype column with the date value to a date or datetime datatype with the same value.
In this article, we will learn how to convert different formats of date/datetime values using Pyspark.
Functions to be imported
from pyspark.sql.functions import to_timestamp
Case1:- 2019–12–25 13:30:00
df = spark.createDataFrame([(‘2019–12–25 13:30:00’,)],[‘date’])
df.show()
+-------------------+
| date|
+-------------------+
|2019-12-25 13:30:00|
+-------------------+
df.types
[('date', 'string')]
d1= df.withColumn(“date”, to_timestamp(“date”, ‘yyyy-MM-dd HH:mm:ss’))
d1.dtypes
[('date', 'timestamp')]
d1.show()
+-------------------+
| date|
+-------------------+
|2019-12-25 13:30:00|
+-------------------+
Case2: 25/Dec/2019 13:30:00
df = spark.createDataFrame([(‘25/Dec/2019 13:30:00’,)],[‘date’])
df.show()
+--------------------+
| date|
+--------------------+
|25/Dec/2019 13:30:00|
+--------------------+
d2= df.withColumn(“date”, to_timestamp(“date”, ‘dd/MMM/yyyy HH:mm:ss’))
d2.dtypes
[('date', 'timestamp')]
d2.show()
+-------------------+
| date|
+-------------------+
|2019-12-25 13:30:00|
+-------------------+
Case3: 12/25/2019 01:30:00PM
df = spark.createDataFrame([(‘12/25/2019 01:30:00PM’,)],[‘date’])
df.show(truncate=False)
+---------------------+
|date |
+---------------------+
|12/25/2019 01:30:00PM|
+---------------------+
df.dtypes
[('date', 'string')]
d3= df.withColumn(“date”, to_timestamp(“date”, ‘MM/dd/yyyy hh:mm:ssaa’))
d3.dtypes
#Note: Spark >= 3.0 This will give an error try to solve using
spark.sql(“set spark.sql.legacy.timeParserPolicy=LEGACY”)
d3.dtypes
[('date', 'timestamp')]
d3.show()
+-------------------+
| date|
+-------------------+
|2019-12-25 13:30:00|
+-------------------+
Thanks for the read🙂.Do clap👏👏 if find it useful.
“Keep learning and keep sharing knowledge”