In this article, we will demonstrate how to import tables from a CSV file, remove invalid data using transformations, and migrate the cleaned data into the destination database using Bold Data Hub. Follow the step-by-step process below.
Sample Data Source:
Sample CSC Data
Note: On-Demand Refresh will be triggered when the pipeline is saved. If needed, the pipeline can be scheduled in the Schedules tab.
Go to the Transform tab and click Add Table.
Enter the table name to create a transform table for customer satisfaction summary.
Note: The data will initially be transferred to the DuckDB database within the designated {pipeline_name} schema before undergoing transformation for integration into the target databases. As an illustration, in the case of a pipeline named “customer_service_data”, the data will be relocated to the customer_service_data table schema.
Learn more about transformation here
Data quality issues such as invalid phone numbers or negative ticket costs can impact analysis and decision-making. This guide explains how to filter out inconsistent records using DuckDB queries.
We use regular expressions (regexp_matches
) to validate phone numbers and apply conditional filtering (WHERE ticket_cost >= 0
) to ensure only valid data is retained.
Ensures phone numbers contain exactly 10 digits (numeric only).
SELECT
Ticket_ID,
Customer_ID,
Customer_Name,
Ticket_Category,
Issue_Description,
Ticket_Status,
Priority,
Agent_ID,
Agent_Name,
Customer_Satisfaction,
Ticket_Creation_Date,
Ticket_Resolution_Date,
Ticket_Comments,
Region,
City,
Country
FROM {pipeline_name}.sample_csc_data
WHERE regexp_matches(CAST(Phone AS VARCHAR), '^[0-9]{10}$');
Removes records where ticket_cost is negative, ensuring only valid financial data is retained.
SELECT
Ticket_ID,
Customer_ID,
Customer_Name,
Ticket_Category,
Issue_Description,
Ticket_Status,
Priority,
Agent_ID,
Agent_Name,
Customer_Satisfaction,
Ticket_Creation_Date,
Ticket_Resolution_Date,
Ticket_Comments,
Region,
City,
Country
FROM {pipeline_name}.sample_csc_data
WHERE ticket_cost >= 0;