In this article, we will demonstrate how to import tables from a CSV file, remove invalid data using transformations, and migrate the cleaned data into the destination database using Bold Data Hub. Follow the step-by-step process below.
Sample Data Source:
Learn about Pipeline Creation
Learn more about transformation here
Data quality issues such as invalid phone numbers or negative ticket costs can impact analysis and decision-making. This guide explains how to filter out inconsistent records using DuckDB queries.
We use regular expressions (regexp_matches
) to validate phone numbers and apply conditional filtering (WHERE ticket_cost >= 0
) to ensure only valid data is retained.
Ensures phone numbers contain exactly 10 digits (numeric only).
SELECT
Ticket_ID,
Customer_ID,
Customer_Name,
Ticket_Category,
Issue_Description,
Ticket_Status,
Priority,
Agent_ID,
Agent_Name,
Customer_Satisfaction,
Ticket_Creation_Date,
Ticket_Resolution_Date,
Ticket_Comments,
Region,
City,
Country
FROM {pipeline_name}.sample_csc_data
WHERE regexp_matches(CAST(Phone AS VARCHAR), '^[0-9]{10}$');
Removes records where ticket_cost is negative, ensuring only valid financial data is retained.
SELECT
Ticket_ID,
Customer_ID,
Customer_Name,
Ticket_Category,
Issue_Description,
Ticket_Status,
Priority,
Agent_ID,
Agent_Name,
Customer_Satisfaction,
Ticket_Creation_Date,
Ticket_Resolution_Date,
Ticket_Comments,
Region,
City,
Country
FROM {pipeline_name}.sample_csc_data
WHERE ticket_cost >= 0;