Skip to main content

Extract, Transform, Load

Data is the lifeblood of modern civilisation.

Data migration from one data source to another.

Data Pipelines

Process

Think of data source as immutable, never change data that provided from raw source.

Use scripts to clean data making a central repository for any systems to consume.

Use additional scripts to transform data to be consumed by target systems.

Collection:

  1. Extract
  2. Transform

Manipulation:

  1. Target Transformation
  2. Load

Data Collection

Scripts and processes to extract data from it's source.

Store data locally as it was provided from source. Do not alter.

Script to create a new instance of data cleaned up to match local system data integrity rules.

Suffix local column names with _local to avoid name collisions.

For example data with country names in COUNTRY create COUNTRY_LOCAL column.

Use logic to transform data as per local requirements.

For example if data was provided as NZ but local system requires full country name of New Zealand then run script to populate.

Manipulation

Use scripts are required to split cleaned source data into data pools for systems to load as required.

Apply system specific business roles then aggregate data if required to match purpose of the system.

Load data into external systems for:

  • Flow Control Dashboards
  • Strategic Analysis
  • Alogrithmic Decision Making

Products