In this article, we will demonstrate how to import tables from a CSV file, remove duplicate records using transformations, and migrate the cleaned data into the destination database using Bold Data Hub. Follow the step-by-step process below.
Sample Data Source:
Sample CSC Data
Note: On-Demand Refresh will be triggered when the pipeline is saved. If needed, the pipeline can be scheduled in the Schedules tab.
Go to the Transform tab and click Add Table.
Enter the table name to create a transform table for customer satisfaction summary.
Note: The data will initially be transferred to the DuckDB database within the designated {pipeline_name} schema before undergoing transformation for integration into the target databases. As an illustration, in the case of a pipeline named “customer_service_data”, the data will be relocated to the customer_service_data table schema.
Learn more about transformation here
Duplicate records in a dataset can lead to inconsistencies and inaccurate analysis. To ensure data integrity, we can remove duplicates based on unique identifiers such as Customer_ID
and Ticket_ID
while retaining the most recent entry.
We use the DISTINCT ON
clause to retain only one record per Customer_ID
and Ticket_ID
, prioritizing the latest entry based on Ticket_Creation_Date DESC
.
WITH Unique_Tickets AS (
SELECT DISTINCT ON (Customer_ID, Ticket_ID) *
FROM {pipeline_name}.sample_csc_data
ORDER BY Customer_ID, Ticket_ID, Ticket_Creation_Date DESC
)
SELECT * FROM Unique_Tickets;