Where two streets have one name


A couple of recent posts from Henrik Liliendahl Sørensen focused on the problem of one street having two names – certainly an issue that creates complexity for automated matching in South Africa.

Two examples – English and Afrikaans street names such as Church Street vs Kerkstraat, as well as the inconsistent use of abbreviations such as 4th Avenue vs Fourth Avenue.

A related problem is when two streets have one name. It is of course quite possible for two streets to have the same name.  Nelson Mandela Drive, for example, has become a very common name for major roads across the country. However, in reality these streets are typically far enough apart that other address elements can differentiate them.

When we factor in human error, however, then address data becomes a lot more complex, and simplistic matching strategies fail more often than is acceptable.

One example – people remember street names but may not remember the street type. When asked for an address of a place they are not really familiar with they may say “Osbourne Road” when in fact they mean “Osbourne Street”. When this kind of error is captured into corporate data in can cause reel challenges.

Another common example is a “corner of” type address – “Our offices are on the corner of 1st and Main Street.” Is that the corner of “1st Avenue”, “1st Street” or “1st Crescent”? In geographies where each of these options may exist where have now created one name for multiple streets.

As mentioned previously, if simplistic, statistical matches are now applied to these data sets where may have a  scenario where “1st Street” is a better match to “1st Crescent” than “Church Street” is to “Kerkstraat”. Similar examples could easily be applied to Name, and other data elements.

An automated match solution is a critical component of any master data management technology stack – but if inappropriately applied can cause more problems than it is worth. Some solutions simply do not offer the granularity to deal with these kinds of complexities – meaning that your operational staff will be overwhelmed by exceptions that must be manually verified.

The bottom line – make sure that your supplier has the experience necessary to ensure that your automated matching solution does what you expect – creates the links between actual matches without creating false positive matches, and without depending on manual validation,  And make sure that the tool you use does not rely on generic statistical matching but can be tuned to your specific needs.

Advertisements

7 thoughts on “Where two streets have one name

  1. Unfortunately, you need a good address database to really validate your addresses. The national address database from afrigis is available but very expensive. Google is very good at matching addresses but their TOS don’t allow for cleaning datasets. I have managed to clean addresses in specific urban areas by collecting geographical datasets for municipalities.

    Also – you obviously need to look at all address fields – i.e. suburb, town etc. It is unlikely to have the same street name in a suburb.

    • You are correct – address validation requires a good address reference data set – there are a number of commercially available data sets with varying levels of completeness – some are good in one area but not good in others etc.

      For matching purposes though you do not need a reference set as you are comparing similar data in the same set.

      Suburb helps but is not by itself enough – for example, Deal Street and Dial Street are very similar and many probabilistic matchers would treat them as the same – adding the suburb does not help. Similarly, if 1st Street is captured as 1st Avenue then suburb doesn’t help. A match approach needs to handle common issues like this correctly.

      • Addresses need to be treated in a class of their own. You simply can’t match addresses with any degree of accuracy without taking other fields into account.

        Say Deal Street and Dial Street – there is no way to match them. Even if you have 123b Deal Street, Small Town and 123b Dial Street, Small Town – there is still no match.

        Even if I gave you

        John Blake, 123b Deal Street, Small Town

        and

        John Blake, 123b Dial Street, Small Town

        there would be enough uncertainty to match.

        However, if I knew that there was no Dial Street in Small Town then I could more comfortably match using the above record.

        Finally, even if you were able to identify a positive match – how would you merge? Which is the correct street? Deal? Dial? Both?

        In my opinion, addresses can be used to help match records – but without a reference database, you can’t match addresses.

  2. The element that needs to be managed is how to arrive at the decision that “very similar” records are duplicates.

    First of all, the tolerance for “very similar” can be quantified objectively sometimes in fuzzy matching. That should mean that a data tool can find and report these records.

    Once reported, the choice for processing potentially duplicated records may be either human (i.e. manual validation of the records against other references like databases, directories, direct contact with the party, etc.) or computer (if only one of the duplicates exists in a single reference database, then discard the others).

    The human route can deliver high quality, given time and ingenuity. The computer route depends on the quality of the reference, and there is an element of trust that the people who built that repository manages their data quality so well that we can accept their records as is.

  3. Not to mention where they do actually have 2 or more names:

    Lansdowne Road as follows:
    From Wetton to Turfhall – remains Lansdowne Road;
    From Turfhall to Palmyra – becomes Imam Haron Road;
    From Wetton to Swartklip – becomes Govan Mbeki Road;
    From Swartklip to Baden Powell – becomes Jeff Masemola Road;

  4. Would matching improve if governments / postal services released their databases as Linked Open Data?

    We have tried this approach with Belgian address registers.
    http://location.testproject.eu/BEL/

    When published on the Web, address registers could quickly become authentic sources of address data, attributing common identifiers to address components such as streets, administrative units, postal codes, etc.

    This is for example an identifier for a “Kerkstraat” in Belgium:

    http://location.testproject.eu/so/ad/AddressRepresentation/AGIV/2000069014

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s