In his recent post, Know your foreign customer, Henrik Liliendahl points out that globalisation presents unique challenges that must be addressed by any serious master data management or data quality initiative. Very few businesses operate in a single geography – any business with a web presence is going to get hits from international prospects and clients. This means that organisations must think globally when selecting their master data and data quality partners – local knowledge and tools cannot address global data quality challenges.
One of our customers deals with clients from nearly two hundred distinct countries. They are faced with all of the challenges desccribed by Henrik’s post – in particular, the challenge of validating legally required client information.
Some of the challenges can be addressed through agreeing data standards – for example, should city names always be converted to English (for example) or should you always stick to the original language spelling i.e. will it be consistently Florence or Firenze?
Can we agree in a single standard for reference data? This is not as easy as it may seem as discussed in my post Bad data – when men are dressed as women
Most challenges are much more complex. A couple of simple examples – what are valid company types for “Lichenstein”, “Thailand” or “Russia”? Is there a formula to validate a British passport number?
We investigated a number of commercial options to validate and enrich client data – in each case they were unable to provide reference data for more than about 20% of the sampel set provided to them. Another client suggestion, wikipedia, was dismissed as a source after a quick analysis of the South African related information confirmed our suspicions that the accuracy of the data could not be trusted.
We were fortunate that our data cleansing solution has an unsurpassed knowledge of global name and address data as this gave us a massive head start and a referebce point for commonly used address formats, business terms, etc.
Ultimately, however, we had to cobble together a variety of reference sources and accept that we would have gaps.
As Henrik points out partnerships need to be established that bring local knowledge together from various sources and amalgamate this to provide a global data reference source. I am sceptical as to whether the world is ready for cloud based data cleansing – their are verious legal implications and technical complexities that have to be addressed before this will become really viable.
However, a cloud based reference set of standards such as valid company types, validation rules, etc for each country (managed by local experts) would definitely address a gap in the market and is worth exploring further.
This post was originally published on the dataqualitymatters blog