Big data! How many zeroes in a brazilian?

Discover the truth about big data! Learn why throwing big, meaningless numbers at the audience won’t cut it. Find out what really sets big data apart from traditional data and explore the six defining characteristics. Size is just one aspect – value matters more than volume.


Brazilians and Big Data

Zettabytes, exabytes, Petabytes – what is up with the big numbers?

A common approach to discussions on big data is to throw big (and meaningless) numbers at the audience.

How many email messages were sent in the last second – cue BIG NUMBER!

How much data has landed on the Internet in the last two years?

Cue EVEN BIGGER NUMBER!

When it comes to , size is a relative concept.

It’s true that the number of data sources and the volume of information that can be stored and analyzed have grown significantly over the years. This increase coincided with the emergence of the term “big data” in popular discussions.

Add up all the big numbers the approach seems to say.

Big data is not (just) about volume

It reminds me of an old joke.

During the US occupation of Iraq, the President attends his daily Iraq situational report.

The general responsible for Iraq leans forward and says, “Mr President, I have bad news. 7 Brazilian troops were killed by roadside bombs last night in Iraq.”

The President is stunned. After a few minutes he manages to compose himself sufficiently to ask, “General, that’s terrible! Tell me, exactly how many zeroes are there in a brazilian?”

Like the President, the average corporate should not be focusing on the zeroes.

it’s important to note that large datasets existed even before the term “big data” gained prominence. What distinguishes big data today is not just the sheer volume of data, but rather the processes, tools, goals, and strategies employed when working with it. This is what sets big data apart from traditional data.

Six Characteristics Defining Big Data

According to Precisely, big data is characterized by the following six features:

  1. Highly scalable analytics processes: Big data platforms like Hadoop and Spark have gained popularity due to their exceptional scalability. They can analyze massive amounts of data without performance degradation. In contrast, traditional methods like basic SQL queries lack scalability unless integrated into a larger analytics framework.
  2. Flexibility: Big data is synonymous with flexible data. Unlike in the past when data was stored in specific databases with consistent structures, today’s datasets come in various forms. Effective analytics strategies are designed to handle diverse data types and enable fast data transformation, including unstructured data.
  3. Real-time results: Unlike traditional data analysis where organizations could afford to wait for results, big data demands real-time insights. Maximizing value means gaining immediate insights, especially in tasks like fraud detection, where delayed results hold little significance.
  4. Machine learning applications: While machine learning is not the only way to leverage big data, it has become increasingly important in this realm. Machine learning sets big data apart from traditional data, which rarely powered such applications.
  5. Scale-out storage systems: Traditional data storage relied on tape and disk drives. In the big data landscape, software-defined scale-out storage systems are commonly used, abstracting data from underlying hardware. However, not all big data is stored on modern platforms, so the ability to swiftly move data between traditional and next-generation storage remains crucial.
  6. Data quality: Data quality is essential in any context, but it becomes even more critical with the complexity of big data. Attention to data quality is a fundamental aspect of an effective big data workflow, considering the intricacies involved in complex datasets and analytics operations.

Value is more important than volume

The value of big data analytics, for most companies, will be in the ability to gain insight into existing data that is currently not being exploited – so-called dark data. This data is of a reasonable size – probably running into the tens or hundreds of terabytes – rather than being super large.

Very few banks, retailers or insurance giants will even be thinking in terms of the large volume of storage punted in the presentations.

Telecommunications example

Yes, big data platforms can bring economies of scale and allow companies to store and analyse more data than is currently the case.

For example, one telecommunications company can now run analytics across 11 years of Call Data Records (CDRs) – the basic unit of measure for telecommunications billing – rather than their previous EDW which could only store 3 months’ worth of data for the same cost. This allows them to build more accurate fraud and churn models, simply because they have access to more content. But the volumes are defined, not meaningless.

Conclusion

The definition of big data is constantly evolving. As businesses rely on successful big data projects, it’s crucial to get it right. Let’s stop throwing meaningless numbers at the world and start looking at real case studies. The numbers are far less alarming than the hype suggests!

Image sourced from http://upload.wikimedia.org/wikipedia/commons/3/38/Rio_de_Janeiro_Helicoptero_49_Feb_2006_zoom.jpg

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.



Related posts

Discover more from Data Quality Matters

Subscribe now to keep reading and get our new posts in your email.

Continue reading