Historically, data quality has been measured in terms of dimensions including:
- Accuracy – the degree to which data reflects the real world
- Completeness – data that is adequately populated
- Timeliness – That data is available when expected and needed
- Consistency – that data across all systems reflects the same reality
- Conformity – that data has consistent standards
- Integrity – That relationships between data are valid and accurate
These metrics are important as they form the basis of data quality KPIs and accountability for data improvements.
What new data quality dimensions does big data introduce?
In their recent webinar Applying Data Quality Best Practises at Big Data Scale, Sycsort Trillium Software suggest a number of new dimensions that are introduced at big data scale.
- Coverage – how well does the data source meet (or fail to meet) the business need?
- Continuity – How well does the data set cover all expected or needed intervals?
- Triangulation – How consistent is data when measured form related points of reference?
- Provenance – Can we validate where the data came from, who gathered it, and what criteria were used to create it?
- Transformation from origin – How has the data changed from its point of origin and how does his affect its accuracy?
- Repetition – identifying whether data from multiple sources is identical indicating potential tampering
These 6 new dimension help to ensure that your analytics results can be trusted, and are definitely worth considering for your advanced analytics use cases. They answer questions about where data comes from, how it was made, and who made it – all critical for establishing trust.
Watch the webinar or contact us for more information as to how to apply them in your environment