In this article, we will demonstrate how to import tables from a CSV file, remove duplicate records using transformations, and migrate the cleaned data into the destination database using Bold Data Hub. Follow the step-by-step process below.
Sample Data Source:
Learn about [Pipeline Creation] (https://help.boldbi.com/working-with-data-sources/working-with-bold-data-hub/working-with-pipelines/)
Go to the Transform tab and click Add Table.
Enter the table name to create a transform table for customer satisfaction summary.
Note: The data will initially be transferred to the DuckDB database within the designated {pipeline_name} schema before undergoing transformation for integration into the target databases. As an illustration, in the case of a pipeline named “customer_service_data”, the data will be relocated to the customer_service_data table schema.
Learn more about transformation here
Duplicate records in a dataset can lead to inconsistencies and inaccurate analysis. To ensure data integrity, we can remove duplicates based on unique identifiers such as Customer_ID
and Ticket_ID
while retaining the most recent entry.
We use the DISTINCT ON
clause to retain only one record per Customer_ID
and Ticket_ID
, prioritizing the latest entry based on Ticket_Creation_Date DESC
.
WITH Unique_Tickets AS (
SELECT DISTINCT ON (Customer_ID, Ticket_ID) *
FROM {pipeline_name}.sample_csc_data
ORDER BY Customer_ID, Ticket_ID, Ticket_Creation_Date DESC
)
SELECT * FROM Unique_Tickets;