The ETL process is often used in data warehousing.
• Data extraction involves extracting data from homogeneous or heterogeneous sources
• Data transformation processes data by cleaning and transforming them into a proper storage
format/structure for the purposes of querying and analysis
• Data loading describes the insertion of data into the target data store, data mart, and data
lake or data warehouse.
A properly designed ETL system extracts data from the source systems, enforces data quality and
consistency standards, conforms data so that separate sources can be used together, and finally
delivers data in a presentation-ready format.“
Why perform an ETL?
To load a data warehouse or data mart regularly (daily/weekly) so that it can serve its purpose
of facilitating business analysis. Or move data from files, xml or other sources to a big data
lake, data warehouse or data mart.
How does the ETL process work?
Extract – ETL developers extract data from one or more systems and/or files and copy it into the
data warehouse
Transform – They remove inconsistencies, assemble to a common format, adding missing fields,
summarizing detailed data and deriving new fields to store calculated data. Most data in an ETL
process is not transformed (about 80% based on a poll of data architects).
Load – They map the data to the proper tables and columns, transform and/or load it into the
data warehouse.
When do we need ETL Testing?
ETL testing can be helpful:
• When setting up a data warehouse for the first time, after data is loaded
• After adding a new data source to your existing data warehouse
• After a data integration project
• After a data migration project
• When moving data for any reason
• If there are suspected issues with data quality in any of the source systems or the target
system
• If there are suspected issues with the performance of ETL processes
Who is involved in the ETL process?
There are at least 4 roles involved. They are:
Data Analyst: Creates data requirements (source-to-target map or mapping doc)
Data Architect: Models and builds data store (Big Data lake, Data Warehouse, Data Mart, etc.)
ETL Developer: Transforms and loads data from sources to target data stores
ETL Tester: Validates the data, based on mappings, as it moves and transforms from sources to
targets
The image on the right shows the intertwined roles, tasks and timelines for performing ETL
Testing with the sampling method.
Dont miss out on the news!
Join the MAKE NOW academy to receive exclusive content every week!