Where two streets have one name

Discover the complexities of automated matching in South Africa when two streets have one name, creating data quality challenges. Read about English and Afrikaans street names like Church Street vs Kerkstraat, and inconsistent abbreviations. Learn how to avoid false positive matches and find the right automated match solution for your data management needs.


Introduction

In South Africa, the issue of one street having two names presents a complex challenge for automated matching systems. This phenomenon occurs with both English and Afrikaans street names, like Church Street vs. Kerkstraat, and is exacerbated by inconsistent abbreviations, such as 4th Avenue vs. Fourth Avenue.

Finding the right automated match solution becomes crucial for effective data management and avoiding false positive matches.

Navigate the complexities of MDM, data integration, and data quality. Our article ‘MDM, Data Integration, or Data Quality: Breaking It Down‘ provides a comprehensive overview of their roles and significance in managing your organization’s data.

The Complexity of Identical Street Names

It is not uncommon for two streets to share the same name, as seen with Nelson Mandela Drive, a frequently used name for major roads across the country. Typically, these streets are distinct enough from each other due to other address elements. However, when human errors come into play, addressing data becomes much more intricate, and simplistic matching strategies prove inadequate.

Human Error and Its Implications

One common issue is that people might remember the street name but not the specific street type. For instance, someone might mention “Osbourne Road” instead of “Osbourne Street” when providing an unfamiliar address. When such errors find their way into corporate data, they can lead to significant challenges.

Another challenging scenario arises when addressing a location as the “corner of” two streets. For example, someone might say, “Our offices are on the corner of 1st and Main Street.” In geographies where multiple street options exist, this creates ambiguity and results in multiple streets sharing the same name.

The Importance of Effective Automated Match Solutions

As mentioned previously, if simplistic, statistical matches are now applied to these data sets where may have a  scenario where “1st Street” is a better match to “1st Crescent” than “Church Street” is to “Kerkstraat”. Similar examples could easily be applied to Name, and other data elements.

Implementing an automated match solution is a critical component of any master data management technology stack. However, choosing the right solution is essential to avoid exacerbating existing complexities. Some solutions may lack the necessary granularity to handle such intricate situations, leading to overwhelming exceptions that demand manual verification.

Choosing the Right Solution

To ensure a successful automated matching process, it is vital to partner with experienced suppliers who can deliver the expected results. The ideal solution should create accurate links between actual matches, avoiding false positives, and minimize the dependence on manual validation. Avoid generic statistical matching approaches and opt for a solution that can be finely tuned to meet your specific data management needs.

The bottom line – make sure that your supplier has the experience necessary to ensure that your automated matching solution does what you expect – creates the links between actual matches without creating false positive matches, and without depending on manual validation.

Explore the benefits and applications of AI matching in our detailed examination of this innovative data matching method.

And make sure that the tool you use does not rely on generic statistical matching but can be tuned to your specific needs.

Unlock the secrets to effective Master Data Management (MDM) with our guide on 10 best practices for master data management. Discover the essential strategies to optimize your MDM initiatives and ensure data accuracy and consistency.

Responses to “Where two streets have one name”

  1. adieyal

    Unfortunately, you need a good address database to really validate your addresses. The national address database from afrigis is available but very expensive. Google is very good at matching addresses but their TOS don’t allow for cleaning datasets. I have managed to clean addresses in specific urban areas by collecting geographical datasets for municipalities.

    Also – you obviously need to look at all address fields – i.e. suburb, town etc. It is unlikely to have the same street name in a suburb.

    1. garymdm

      You are correct – address validation requires a good address reference data set – there are a number of commercially available data sets with varying levels of completeness – some are good in one area but not good in others etc.

      For matching purposes though you do not need a reference set as you are comparing similar data in the same set.

      Suburb helps but is not by itself enough – for example, Deal Street and Dial Street are very similar and many probabilistic matchers would treat them as the same – adding the suburb does not help. Similarly, if 1st Street is captured as 1st Avenue then suburb doesn’t help. A match approach needs to handle common issues like this correctly.

      1. Adi Eyal (@SoapSudTycoon)

        Addresses need to be treated in a class of their own. You simply can’t match addresses with any degree of accuracy without taking other fields into account.

        Say Deal Street and Dial Street – there is no way to match them. Even if you have 123b Deal Street, Small Town and 123b Dial Street, Small Town – there is still no match.

        Even if I gave you

        John Blake, 123b Deal Street, Small Town

        and

        John Blake, 123b Dial Street, Small Town

        there would be enough uncertainty to match.

        However, if I knew that there was no Dial Street in Small Town then I could more comfortably match using the above record.

        Finally, even if you were able to identify a positive match – how would you merge? Which is the correct street? Deal? Dial? Both?

        In my opinion, addresses can be used to help match records – but without a reference database, you can’t match addresses.

  2. chrisolder

    The element that needs to be managed is how to arrive at the decision that “very similar” records are duplicates.

    First of all, the tolerance for “very similar” can be quantified objectively sometimes in fuzzy matching. That should mean that a data tool can find and report these records.

    Once reported, the choice for processing potentially duplicated records may be either human (i.e. manual validation of the records against other references like databases, directories, direct contact with the party, etc.) or computer (if only one of the duplicates exists in a single reference database, then discard the others).

    The human route can deliver high quality, given time and ingenuity. The computer route depends on the quality of the reference, and there is an element of trust that the people who built that repository manages their data quality so well that we can accept their records as is.

  3. Jonathan Tee

    Not to mention where they do actually have 2 or more names:

    Lansdowne Road as follows:
    From Wetton to Turfhall – remains Lansdowne Road;
    From Turfhall to Palmyra – becomes Imam Haron Road;
    From Wetton to Swartklip – becomes Govan Mbeki Road;
    From Swartklip to Baden Powell – becomes Jeff Masemola Road;

  4. Stijn Goedertier (@stijngoedertier)

    Would matching improve if governments / postal services released their databases as Linked Open Data?

    We have tried this approach with Belgian address registers.
    http://location.testproject.eu/BEL/

    When published on the Web, address registers could quickly become authentic sources of address data, attributing common identifiers to address components such as streets, administrative units, postal codes, etc.

    This is for example an identifier for a “Kerkstraat” in Belgium:

    http://location.testproject.eu/so/ad/AddressRepresentation/AGIV/2000069014

    1. garymdm

      @Stijn = Of course this approach assumes that the postal service has a street level dataset – not true in South Africa and many other countries.

Leave a reply to chrisolder Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.



Related posts

Discover more from Data Quality Matters

Subscribe now to keep reading and get our new posts in your email.

Continue reading