Hadoop: Quick Facts

Why HadoopHadoop is a highly scalable, NoSQL database used to perform high speed analytics against large volumes of data.

Hadoop works on the principle of schema on read, not schema on write. Any data (structured or unstructured) can be stored in Hadoop with out developing a schema. This cuts the development time scales, reduces risk complexity and reduces the impact of poor quality data that may have caused traditional ETL jobs to fail. Instead, consuming programs determine and apply structure when they access it.

Hadoop runs on commodity hardware. This makes it easily 10 times cheaper to deploy than the high end, specialised hardware used for typical enterprise data warehouse deployments (based on the average cost per terabyte of computing power). Where the average EDW may store and analyse around 15TB of data, typical Hadoop deployments may store and process a few hundred TB of data for the same cost.

Hadoop is fault tolerant. Hadoop copes with expected failures to the commodity hardware used through data replication and speculative processing. This means that Hadoop will run multiple copies of the same task (assuming resources are available) until one returns results.

Hadoop requires a new approach. Traditional BI and ETL tools are designed to work with predefined, structured schemas. While these tools can be made to work with Hadoop, typically through a Hive interface, this approach negates a key benefit of Hadoop – the reduction in development time and costs allowed by schema on read.

Hadoop is designed to answer different questions to those typically asked by BI tools, and new tools and methods are required to get real value.

Get the @Datameer Infographic – Why Hadoop for more insights [Tweet this]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.