Improve Address Data Accuracy with Geocoding!

Improve address data accuracy with geocoding! Discover how this proof of concept successfully matched South African addresses to latitude and longitude points. Learn about the challenges of data quality and the importance of context and testing in achieving accurate geocoding results.


saints

Unlock the power of accurate customer data with our Name and Address Matching and Enrichment services. Explore how our solutions can efficiently identify and enhance customer information, ensuring better targeting, personalized communication, and improved decision-making. Discover more about our Name and Address Matching and Enrichment capabilities to elevate your data quality standards and drive business growth.

Introduction

Geocoding is the process of adding latitude and longitude points to addresses, thereby enhancing address data accuracy. In a proof of concept, we successfully matched South African addresses to precise geolocation points. However, achieving accurate geocoding results comes with challenges, particularly related to data quality, context, and testing.

The Geocoding Process

Previous attempts at geocoding had been extremely difficult. The process had delivered limited results and had required months of effort and substantial manual intervention.

In short, it has been so complex and so unrewarding that the process had not been attempted in several years.

We were asked to put our money where our mouth is. We claim to have an off-the-shelf set of South African name and address rules that allow us to quickly and effectively understand and geocode free-format South African addresses.

The client provided us with a set of some 500000 addresses and asked us to deliver results within a few working days. They wanted to measure both the accuracy of the results and the time to value, as well as the performance of the application.

Quick and accurate geocoding

Within a day we had the following results – based on our standard off-the-shelf rules with some minor, data-set specific, configuration.

Elapsed Time: 00:08:42

Record Input
Count Statistic

513288 Records processed

Record Matches
Count Description

442611 Records coded with Latitude and Longitude.
70677 Records not coded with Latitude and Longitude.

Accuracy Match Levels
Count Description

167048 Records Matched At Level 5 – Interpolated Rooftop.
40956 Records Matched At Level 4 – Street Level.
210837 Records Matched At Level 3 – Postal Code Centroid.
23619 Records Matched At Level 2 – City Centroid.
151 Records Matched At Level 1 – Region Centroid.
0 Records Matched At Level 0 – Country Centroid.

Address Accuracy Match: 86.2%


Geocoding is, in essence, a simple process. All that is needed is to match the address captured for your customer, or supplier, to a reference data set that contains a point.

The Challenge of Data Quality

Geocoding seems like a straightforward process—matching addresses to reference data sets containing geographic points. However, poor data quality can make this task extremely difficult. Some common address quality issues include missing or inaccurate information, misfielded data, misspellings, and outdated information, for example:

  • Missing or inaccurate information – e.g. no postcode, incorrect postcode.
  • Misfielded information – e.g. street name in the suburb field
  • Misspellings and typing mistakes – e.g. “raod” instead of “road”
  • Out-of-date data – e.g. still using an old / replaced place name

Additionally, the accuracy and completeness of the reference data set play a crucial role in geocoding success, especially in areas with informal or rural settlements.

Curious Exceptions and Previous Cleansing Attempts

“when the saints come marching in”

Louis Armstrong

For geocoding, one is also dependent on the accuracy and completeness of one’s reference data set. It is easy to find a spatial location for an urban address in a major city. But many South African live in informal or rural settlements that are not well-mapped or understood.

In these cases, we may only be able to provide a regional centroid, rather than an exact location.

Yet, in this particular case, I found a number of curious exceptions that I did not at first understand.

I had addresses such as:

  • 15 Govan Mbeki Saint
  • 1127 Phalaborwa Saint
  • and many more similar examples.

It was only when I was presenting the results to the client that I figured this out.

At some point in the past, a data “cleansing” project had been run where someone had (probably) decided that ST should be replaced by SAINT

As a result, thousands of street addresses had been degraded.

I frequently see the results of previous cleansing attempts in production data that we are asked to resolve. Often, the data cleansing attempt has made things worse.

During our analysis, we encountered some unusual results, such as addresses with the term “Saint” in place of “St.” After further investigation, we discovered that a previous data cleansing project had degraded thousands of street addresses. This highlights the importance of working with experts who understand the context and complexities of data, as well as thoroughly testing changes before deployment.

Three Points for Achieving Quality Data

To ensure high-quality geocoding and data accuracy, consider the following principles:

  1. Collaborate with experts: Our team would never have made such a rudimentary mistake as we follow the two additional principles below
  2. Context Matters: You cannot do blanket changes on data. You need to understand the context and only make changes where they are relevant. 12 ST WINIFREDS ST – the first ST is SAINT, and the second is STREET. We understand and manage this kind of complexity
  3. Test Before Deployment: Even if an error of the nature identified had been proposed as a cleansing rule, this should have been tested and the issue picked up before this was applied to the live data. I have been under pressure, particularly from IT, to deploy “cleansed” data into production without testing and business validation of the proposed changes. This is always a mistake!

Gain insights into optimizing address data quality specific to South Africa. Our South Africa Address Data Quality Guide offers practical strategies and tips tailored to the unique challenges and requirements of South African addresses, ensuring accuracy and reliability in your data.

Gain a competitive edge in your industry with geocoding technology. Explore the ways geocoding provides a competitive edge by unlocking location-based insights, optimizing operations, and delivering personalized experiences to your customers. Discover how geocoding can propel your business forward

Conclusion

Geocoding offers valuable benefits for businesses, enabling enhanced location-based insights and decision-making. By prioritizing data quality, understanding context, and rigorous testing, organizations can harness the power of geocoding to unlock the full potential of their address data.

What strange results have you seen in your data?

Tags:

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.



Related posts

Discover more from Data Quality Matters

Subscribe now to keep reading and get our new posts in your email.

Continue reading