Data transformation is a relatively mundane yet fundamental data management capability – particularly when dealing with similar data from multiple sources.
Three simple examples:
- System A represents Male and Female and 0 and 1, while System B represents Male and Female as M and F
- System A stores dates in the US format (MM/DD/YYYY) whilst System B uses the South African format (DD/MM/YYYY)
- System A uses a comma as a decimal point ( 123,33 ) while System B uses a point (123.33)
In each case, we need to transform the data into a common format for aggregating, matching and reporting.
In a recent blog post, our partner Syncsort outlines the four steps inherent in any data transformation process. Syncsort’s ETL (Extract, Transform and Load) and CDC (Change Data Capture) capabilities are particularly highly rated by analysts based on their ability to transform data from diverse formats e.g. when moving data from the mainframe to relational databases and Hadoop. Adding Syncsort trillium Software to the mix ensures award winning data profiling, cleanisng and matching across these platforms
The four steps to data transformation
Step one: Data interpretation
Before we can begin any transformation process we need to understand what our data looks like. This is most efficiently done using data profiling tools
Step two: Pre-translation quality check
At this step we validate data against known requirements and standards to identify any errors or issues that we may not expect and must address
Step three: Data translation
This is the physical process of changing data into the required format.
An interesting observation by Sycnsort is that we may not only change the content of a data set, but also the format e.g. converting a CSV file to an XML format
Step four: Post translation quality check
We repeat step two at after running our translation to measure the improvement and test the outcome.
By planning for and following these four steps your should quickly begin to improve the usefulness of your aggregated data