Eight insights for Fantastic Dates

candle light dinnerWhen I told my wife I was going to write this column her first question was : “What do you know about dates?”

Quite a lot actually: They are insidious little attributes that cause a number of problems in any data set.

Stop struggling with poor quality dates! [Tweet this]

Dates play four major roles within data.

1. They form part of master data – the unique defining characteristics of a person (e.g., birth date) or product (e.g., contract start and end dates).
2. They form part of transactional data – purchase date or claim date.
3. They create context for reporting – sales today, last week, last year.
4. They provide an audit capability – when was this record captured or changed?

Poor quality dates impact on our ability to operate our business, or to run accurate analyses. [Tweet This]

Common data quality issues that affect dates include:

1. Missing dates.

These are fairly easy to find but can be difficult to correct. Missing dates may reflect as NULL values or as white space.

Missing information must typically be traced from the original source.

2. Inconsistent storage calculations.

These can be more of a challenge to identify, but are typically easy to resolve programmatically. Different applications store dates in differing ways. For example, most databases store dates as a count of the days since a certain start date – but each database vendor may apply their own standard. Or the application may apply its own logic to the date field – storing it as an integer count but using a different base date to the underlying database.

To a human observer, each date may be displayed as a number. The correct formula may have to be applied to convert the data to a human-readable format. It is possible for mistakes to creep in and for a count to be converted to the wrong date, particularly when different approaches are used across various systems and applications.

The challenge here is more of a data governance issue. It is important to understand how dates are stored in each environment and apply the correct logic when calculating the dates. In some cases this may occur automatically when data is extracted. A common standard must be agreed for combining dates from multiple sources.

3. Dates stored as text.

In some environments dates may not be stored as an integer count, but rather as a text representation of the date (for example, “25/12/2007” or “5 February 1998”).

This approach creates two major problems.

Firstly, invalid dates (30 Feb 2010) may be captured. Secondly, even human readable dates can be inconsistent – particularly when amalgamating regional differences. Does “02/07/11” represent 2 July 2011, 7 February 1911, or even 11 July 2002?

It is fairly simple to put in data quality checks to ensure valid dates. The date format issue, which was a major contributor to the Y2K problem, requires more governance. A common standard must be agreed and, where necessary, convert other dates to this standard.

4. Dates ranges. Date ranges typically impact on operational issues, such as checking whether a claim was made during the period a warranty was in place.

However, the introduction of anti-spam legislation such as the South African Consumer Protection Act and Protection of Personal Information act may now require companies to be able to prove that promotions were sent during an opted in period, even if the client has subsequently opted out of receiving marketing materials.

The ability to quickly and easily identify good quality transactions (occurring within the contract period) can be of real importance and is a key data quality measure.

5. Default dates. Date of birth and similar values should show a reasonable spread. If 10% of your dates of birth, for example, fall on the first of the month then this could be an indication of a data quality problem.

In many cases, this kind of issue is caused by a default rule – if you have no date of birth, or it is invalid, then default to the first of the month.

These can be picked up by data profiling but typically require additional investigation to confirm. I would suggest that it is better to have no date that a default date as default dates can significantly skew analyses and have no relevance.

6. Unexpected use of dates.
In many applications date fields are assigned specific roles. For example, a field called “Birth Date” may reasonably be assumed to hold the date of birth of a person.

However, this may not always be the case. Off-the-shelf named fields are often repurposed if not immediately useful, as this is quicker and cheaper than creating custom fields. Maybe marketing were more interested in knowing when a customer was acquired, than in knowing when she was born. So the “Date of Birth” field is used to capture the “Customer Acquisition Date.”

This is a typical data governance problem. Downstream users (for reporting and other applications) often do not know about these changes and assume the data is being used for the labelled purpose. This can lead to invalid reports (for example segmenting customers based on incorrectly calculated ages.)

Ideally, your data governance platform should document and communicate these kind of repurposing, ensuring that everybody in the organisation understand which fields are used for which purpose, and which calculations use which fields.

7. Hidden dates. Some data – for example South African identify numbers, some barcodes or company registration numbers include a date as part of the formula. These hidden dates can be checked for sanity against other date related information in the record. If the two dates do not tie up it can be a sign of accuracy issues within the record.

8. The lack of an audit date. In Simple tips for designing for data quality I discussed how dates provide valuable inputs for root cause analysis. It is important to understand when data was captured, or changed, and by whom, in order to trace specific issues to their cause.

Be proactive about how you manage dates and save time and money [Tweet this]

By using your knowledge of these common date related issues in your planning you can reduce the time you spend in the office fixing issues down the line. Maybe you will even find time for a real date with your partner.

Image sourced from http://pixabay.com/en/candle-light-dinner-eat-table-101095/

One thought on “Eight insights for Fantastic Dates

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.