Big data quality


Ventana’s Research recent Big Data Integration benchmark survey  supports the growing awareness that data quality and integration are the principle time sinks for big data projects.

There research finds that more than 50% of the time allocated to any big data project is taken in reviewing the data for quality and consistency – not surprising given the largely manual / code driven approaches that have historically been required. [Tweet this]

And, like any analytics project, poor quality input data will result in poor quality analytics.

Big Data Quality for Hadoop

Yet, until now, mainstream data quality solutions have not adapted to the complexities of big data quality.

This is why I am excited by the release of the new Trillium Big Data data quality platform by enterprise data quality vendor, Harte Hanks Trillium Software.

Deploying an enterprise data quality tool directly against Hadoop has game changing implications for both the data quality vendor and for the big data platform. [Tweet this]

From an IT architecture perspective running data profiling and cleansing jobs as infinitely scalable MapReduce functions promises massive performance increases, particularly when dealing with the large data sets common in big companies such as retailers, banks and telecommunications operators.

The insights gained by profiling big data sets with Trillium Software Discovery will dramatically reduce the cost and time to insight of big data projects.

I recently wrote about how time to insight is the defining characteristic of big data.

The sound understanding of any underlying data quality issues and anomalies delivered by data profiling is critical to cutting big data delivery times as it is for any other data intensive project. This understanding allows big data projects to be accurately planned, allows data issues to be resolved and leads to more accurate analytics.

I have also written of the important role of business stakeholders in the big data team.

Even many technical staff struggle with the complexities of Ma Reduce and Spark APIs. Tools, like TS Discovery, that deliver powerful insight via simple interfaces empower both business and technical staff to collaborate to deliver value quickly.

Trillium Big Data could be the tool that data scientists and business analysts have been waiting for.

Contact us for more information about Big Data Quality

Image sourced from http://www.balivouchers.info/elephant-park-taro-voucher/
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s