Ease of implementation vs “ease of use” for data quality

Build vs Buy

Navigate the complex landscape of data quality management with insights on what to look for in a data quality management tool, ensuring optimal performance and efficiency.

In today’s business landscape, most experienced IT and business managers unanimously agree that it is more cost-effective and less risky to purchase an existing, functional solution rather than building one from scratch.

This is evidenced by the significant investments made in ERP applications, CRM systems, and other off-the-shelf solutions. By opting for an existing data quality tool, you can leverage established best practices and streamline your implementation process, ultimately enhancing your business operations.

Yet, all data quality platforms are not created. When selecting a tool, you need to consider whether you are buying a toolkit, which will require you to build your data quality processes from scratch, or an application that gives you a head start and accelerates your time to value.

selecting a data quality application for time to value

The Value of Buying an Existing Application

When you choose to buy an existing application, you unlock numerous advantages that contribute to the ease of implementation and overall success of your data quality initiatives:

1. Leveraging Best Practice Business Processes

One of the key benefits of purchasing an existing application is the ability to leverage pre-defined and proven business processes embedded within the solution. Our best practices have been refined over time and can significantly enhance your business operations. By adopting these established processes, you can avoid reinventing the wheel and capitalize on industry expertise.

2. Reduced Risk of Project Failure

Implementing a data quality solution from scratch entails inherent risks.

However, by selecting a well-designed existing application, you can start your project with a significant head start. With a solution that already aligns with 60% to 80% of your requirements, you minimize the chances of project failure. Instead of starting from square one, you can focus your efforts on customizing and fine-tuning the solution to meet your specific needs.

3. Time and Resource Savings

By choosing a ready-to-use application, you save valuable time and resources that would have otherwise been spent on building a solution from the ground up. With reduced development efforts, you can allocate your team’s energy towards your core business activities, allowing you to drive growth and innovation.

The Role of Ease of Implementation in Data Quality

In the realm of data quality applications, it’s common to find similar features and claims of user-friendliness, integration capabilities, and scalability.

As a result, the primary differentiator between these solutions lies in their ease of implementation. The term “ease of implementation” refers to the head start and advantage you gain by selecting a solution that is specifically designed to address the complexities of your data landscape.

Addressing Complexities: A South African Example

Let’s consider the intricate challenges of customer data in South Africa.

The country has diverse address formats, including variations in languages (English and Afrikaans), numerous suburbs and towns (approximately 18,000), and multiple valid postal codes for each location, depending on whether mail is delivered to a street address or a post office box. Additionally, frequent changes in place names further complicate the data, as different records may contain variations based on their age. For example, older records may still refer to Potgieterus rather than Mokopane.

Furthermore, there is a wide range of address types, from structured urban addresses .g. “PO Box 12, Centurion, 0082” to unstructured rural addresses such as “The white house next to the cafe, Mzo Village”.

Rural villages and informal settlements often lack formal documentation and recognition as formal locations e.g. “Mandelaville Squatter Camp, Joe Slovo T/ship”

Accompanying these complexities are countless spelling variations for even valid place names, e.g. “East London” and ”Aest London”

To successfully tackle these challenges, it is imperative to choose a data quality solution that offers a substantial head start in addressing these complexities. A solution equipped with pre-built rules and capabilities tailored to the South African context can save significant time and effort. Trying to build these rules from scratch would add years to your project timeline and inflate costs.

A simple example:

A simple example, imagine you wish to identify duplicate client records in a dataset containing the following:

“ABC Apteek (EDMS) BPK” at “ATKV Gebou 15delaan 24 Magaliesig”
“ABC Pharmacy (Pty) ltd” at “42 Fifteenth Ave, Magaliesview”

Our solution will give you the following head start:

1.) For the name

We recognise that we have a business in each line from our comprehensive list of South African business types.
We recognise that “(EDMS) BPK” is the Afrikaans equivalent of “(Pty) Ltd” and we handle a large array of common spelling and typing errors e.g. PtyLtd, (Prop) Ltd, (PTY)Ltd, etc
We recognise that “Apteek” is an Afrikaans business term and is equivalent to “Pharmacy”
We build a standardised English business name for each record – “ABC Pharmacy” – off the self and without impacting the original data

To build this from scratch requires a solid understanding of English and Afrikaans business terms, building the translations, resolving spelling errors, etc. Imagine doing this for 100 000 records, or a million! Imagine doing this for individuals – is Bongani a given name, or a surname? Or is it a place name?

2.) For the address

We recognise that “Magaliesig” is a valid South African postal suburb and we standardise to the correct English variation – “Magalies View”
We recognise that “Magaliesview” is a common spelling error and standardise to the correct “Magalies View”
We recognise that “15delaan” is an Afrikaans street name and type and standardise to “15th Avenue”
We recognise that “Fifteenth Ave” is an English street name and the abbreviated street type and standardise to “15th Avenue”
We recognise, based on the address format that the Afrikaans address has a house number of 24, while the English address has a house number of 42.
We recognise that the first address has additional information – a building called the “ATKV Building”
We can identify and append a valid postcode

Once again, building these rules from scratch requires a solid understanding of South African geographies, the ability to translate from English to Afrikaans variations, the ability to recognise a valid location, or an unambiguous close variation and correct this, etc.

We have found and built into our standard rules hundreds of spelling variations/errors for even simple place names. Imagine having to go through this process for a million address lines!

What if your data set includes records from other countries – in Africa, or the UK, or Russia? How will you handle these?

Matching and Standardization

Matching and standardizing data elements, such as business names and addresses, play a crucial role in data quality initiatives. With an off-the-shelf solution, you benefit from pre-defined matching rules that can identify duplicate records effectively. These rules take into account various factors, including standardized business terms, language variations, and address formats.

By relying on the expertise embedded in a data quality application, you eliminate the need to develop matching rules from scratch. The solution can intelligently handle common errors, spelling variations, and transpositions, providing accurate and reliable results. This saves your team from having to navigate the intricacies of data matching on their own.

Extending our example:

We know that we have the same business name (standardised), and we know that we have the same street name and suburb name (standardised). We can ignore the fact that we have different building names (one is not populated) and that we do not have a postal code – this is a delivery address. The house numbers are a transposition – a common typing error – and we have a best practice mechanism for dealing with this.

How would you define your matching rules from scratch?

What if the house number is missing?

What if the data sets had different postcodes?

There are literally hundreds of combinations and conditions that need to be taken into account if these are not provided off the shelf!

Does your team have the experience to handle these correctly without a head start?

The Benefits of a Pre-Built Solution

Choosing a data quality solution that provides a comprehensive head start tailored to your specific data challenges offers several significant advantages:

1. Significant Time and Cost Savings

By utilizing a pre-built solution that addresses the complexities of your data landscape, you can expedite your project timeline and reduce costs. The ready-to-use functionality and pre-defined rules enable faster implementation, allowing you to see returns on your investment almost immediately.

I am aware of one organisation that has had a team of 8 consultants building rules like this for over a year using a tool that did not provide this kind of head start. By comparison, we have successfully delivered a number of projects, for different clients, within three months – using small teams of one or two consultants. The cost savings are obvious!

2. Focus on Business Objectives

More importantly, there must have been a business reason you wanted to clean up your data!

Maybe you want to launch a new product in a specific market segment, reduce client churn, cut postal delivery costs, or improve planning through better analytics!

Maybe you are migrating to a new application and need to create an accurate, consolidated view of each unique client, supplier, product or employee before you can go live!

Maybe you need to comply with regulations and are at risk of penalties or fines in the event of non-compliance!

With an off-the-shelf solution handling the data quality aspects, you can shift your focus and allocate resources to your core business objectives. Instead of getting caught up in lengthy development cycles, you can concentrate on launching new products, reducing client churn, optimizing delivery processes, or improving planning through advanced analytics.

3. Compliance and Risk Mitigation

Many industries require compliance with strict regulations, and non-compliance can result in penalties and fines. By leveraging a pre-built solution, you ensure data accuracy and integrity, mitigating the risks associated with non-compliance. Having a reliable data quality foundation enables you to meet regulatory requirements efficiently and avoid costly consequences.

Conclusion

When it comes to data quality, the choice between building a solution from scratch or purchasing an existing application is clear.

The ease of implementation provided by a well-designed, pre-built solution delivers significant advantages, including leveraging best practices, reducing project risks, and saving time and resources.

By selecting a solution that addresses the complexities of your data landscape, you gain a substantial head start and accelerate your path to success.

So, why embark on the arduous journey of building from scratch when solutions exist off the shelf?

Make informed decisions when selecting a data management tool by considering insights from Considerations for Choosing a DM tool, ensuring it aligns with your organization’s needs and goals

Gain insights into key battlegrounds in the data quality landscape with highlights from Choosing a Data Quality tool, ensuring informed decisions in selecting the right tool for your organization.

Response to “Ease of implementation vs “ease of use” for data quality”

Lenny Seabrooke

September 25

I disagree with your blog, I just don’t believe all the facts are true. I did have fun reading it, so keep at it!Mail Forwarding