Removal of duplicate records in ADF(Azure Data Factory)
One of the most frequently asked questions in a data engineering interview. In this article let’s see how we can remove duplicate records.
Consider an employee table with duplicate records of an employee.
Select a Data flow activity to remove duplicate records.
Select a Dataset from the storage where data is placed in an Azure.
Preview all the records from the dataset.
Select the specific records and check the dataset columns.
Check the selected specific records.
Select the Windows function to segregate the duplicate records.
Preview the mapped data from the dataset.
Now filter the records where rownum ==1 as duplicate records are sorted with the latest entry first.
Preview the records where rownum=1.
Now add the nonduplicate records to the sink.
Preview the non-duplicate/unique records.
Now check the destination folder in the storage account.
Thanks for the read🙂.Do clap👏 if find it useful.
“Keep learning and keep sharing knowledge”