The state of data quality in 2020

If we believe the press, artificial intelligence and machine learning are where business is focusing its data management efforts.

Yet, in practice, the interest in AI and ML is highlighting the need for more mundane disciplines, such as data quality, to ensure trust in the machine learning models.

Interest in data quality is driven from the top

A recent O’Reilly survey on trends in AI and ML shows that this renewed interest in data quality and data governance is a global phenomena and driven from the top, although coal face analysts are also deeply impacted.

This survey reinforces the findings of Syncsort’s 2019 data quality survey, which can be find here, with a key difference being the increased level of interest at the C-level when compared to last year.

In spite of this the survey shows that very few companies are struggling to solve the problem. Two basic challenges are highlighted:

  • The basic data management foundations – such as metadata and lineage – are not in place. Without this providence decision makers and data analysts struggle to find the trusted data sets that they need to build machine learning models.
  • Data quality issues are overwhelming. Most respondents indicate that they have too many data sources and too much inconsistent data. Tactical projects do not have the resources to tackle data quality problems.

Limited strategic options:

Very few organisations have created dedicated data quality teams, while, similarly only 20% of companies surveyed publish data lineage and provenance. This lack of investment can be traced back to the lack of basic data management foundations.

The survey drew the conclusion that things will get worse before they get better.

Data quality solutions are typically impacted by politics and cost . Some group(s) will have to change the way they do things, whilst the money to pay for data quality will often come out of another group’s budget.

The increased C-level interest is good, but in order to change the culture and drive data quality at an enterprise level we need to also begin to put the data governance foundations in place.

This is because data quality is more a people-and-process-laden problem than a technological one. It isn’t just that different groups have differing standards, expectations, or priorities when it comes to data quality; it’s that different groups will go to war over these standards, expectations, and priorities.

Data governance structures allow various stakeholders to collaborate and find common ground for data quality. However, the survey recommends that data governance efforts focus on the basics to deliver trust.

We have the track record and toolkits to deliver data governance basics, metadata management and data lineage, and data quality across both structured and unstructured data sets.

Set up a meeting +27114854856 to understand how we can help you to get the basics right.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.