Working with Dates and Time in PySpark
Recently working on my current project faced a scenario where we needed to convert a string datatype column with the date value to a date or datetime datatype with the same value.
In this article, we will learn how to convert different formats of date/datetime values using Pyspark.
Functions to be imported
from pyspark.sql.functions import to_timestamp
Case1:- 2019–12–25 13:30:00
df = spark.createDataFrame([(‘2019–12–25 13:30:00’,)],[‘date’])
df.show()
+-------------------+
| date|
+-------------------+
|2019-12-25 13:30:00|
+-------------------+
df.types
[('date', 'string')]
d1= df.withColumn(“date”, to_timestamp(“date”, ‘yyyy-MM-dd HH:mm:ss’))
d1.dtypes
[('date', 'timestamp')]
d1.show()
+-------------------+
| date|
+-------------------+
|2019-12-25 13:30:00|
+-------------------+
Case2: 25/Dec/2019 13:30:00
df = spark.createDataFrame([(‘25/Dec/2019 13:30:00’,)],[‘date’])
df.show()
+--------------------+
|…