Search results

Flagging Suspicious Data and Transforming Data Using Bold Data Hub

In this article, we will demonstrate how to import tables from a CSV file, flag suspicious data through transformations, and move the cleaned data into the destination database using Bold Data Hub. Follow the step-by-step process below.

Sample Data Source:

Sample CSC Data


Creating Pipeline

Learn about Pipeline Creation

Applying Transformation

Learn more about transformation here

Flagging Suspicious Data

Overview

To maintain data accuracy, records with conflicting information should be flagged. For example, an “Open” ticket should not have a resolution time, and a “Resolved” ticket should have a valid resolution time.

Approach

We use a CASE statement to identify and flag suspicious records:

  • “Conflict” → Open tickets with a resolution time
  • “Invalid Resolution Time” → Resolved tickets with missing or non-positive resolution time
  • “Valid” → All other cases

SQL Query for Flagging Suspicious Data

SELECT 
    Ticket_ID, 
    Ticket_Status, 
    Resolution_Time, 
    CASE 
        WHEN Ticket_Status = 'Open' AND Resolution_Time IS NOT NULL THEN 'Conflict' 
        WHEN Ticket_Status = 'Resolved' AND (Resolution_Time IS NULL OR Resolution_Time <= 0) THEN 'Invalid Resolution Time' 
        ELSE 'Valid' 
    END AS Suspicious_Flag 
FROM {pipeline_name}.sample_csc_data;

Tranformation Use Case