Data integration defined
Data integration is a common industry term referring to the requirement to combine data from multiple separate business systems into a single unified view, often called a single view of the truth. This unified view is typically stored in a central data repository known as a data warehouse.
For example, customer data integration involves the extraction of information about each individual customer from disparate business systems such as sales, accounts, and marketing, which is then combined into a single view of the customer to be used for customer service, reporting and analysis.
ETL and ELT
There are a number of different data integration approaches which can be used to populate a data warehouse, two of the most common are:
- ETL – Extract, Transform and Load. Data is extracted from multiple source systems, transformed into a usable format, and then loaded into a data warehouse where users can query and report on the unified view of the data. Learn more about ETL ›
- ELT – Extract, Load and Transform. Data is extracted from source systems, loaded into a database staging area, and then transformed to a usable format within the database. This approach may be necessary to meet check-pointing, auditing, and security requirements.
The trade-off with ELT is that storage space and CPU usage within the database can be relatively expensive. Using ETL to perform the transformation step outside of the database reduces the storage and CPU load on the data warehouse, making ETL a more cost-effective approach for many applications.
Data Integration Criteria
- Freshness – Since data integration processes are executed periodically, data freshness refers to the delay between when a change occurs on a source system to when the change appears in the data warehouse.
- Granularity – Data from source systems is often summarised or aggregated during the data integration process. Data granularity refers to the level of detail that will be stored in the data warehouse.
- Matching – The business rules that define how data from different systems should be matched can be quite complex. For a simple example – is “Bill Smith” the same individual as “William Smith”?
These criteria will affect the performance and storage requirements for the data warehouse, and hence the cost of the hardware and software required for the project.
Originally written by Keylink Technologies, Australia