
Introduction
In the world of data warehouses and data catalogs, there’s a curious phenomenon. becomes the de facto “go-to” source, even if it’s not the perfect fit for every analysis. There’s a logic to this – data engineers have built pipelines, users are familiar with the format, and it offers a seemingly quick and easy solution. But this blind trust in the familiar can mask a hidden truth: the most used data might not be the most valuable.
There’s a logic to this!
Data engineers, under pressure, prioritize well-understood datasets. Building new data pipelines or integrating external data takes time and resources. So, they push users towards the familiar, the “80% solution” that gets the job done… kind of.
But this blind trust in the familiar can mask a hidden truth: the most used data might not be the most valuable.
Let’s delve deeper. Imagine a data scientist needing customer sentiment data. The readily available dataset might be website clickstream data – a good proxy, but not ideal. It captures activity, not emotions. This “80% fit” approach saves time and resources for data engineers, but it leaves data scientists thirsty for the perfect data – the missing 20%.
Over time, this cycle perpetuates. The clickstream data becomes the established source, usage numbers climb, and its perceived value inflates. This creates a feedback loop, pushing users further down a path of “good enough” instead of “truly insightful.”
The Cost of Convenience
Here’s the rub: while familiarity breeds comfort, it can also breed stagnation.
- Analysis Paralysis: Users, stuck with a dataset that only partially meets their needs, force fits and contort their analysis. This leads to questionable insights and missed opportunities.
- Data Silos Persist: Reliance on a single dataset reinforces departmental data silos. Valuable insights locked away in other parts of the data warehouse remain untapped. At
- Misperception of Value: Over time, the frequently used dataset becomes synonymous with “valuable data,” even though it may not be serving everyone’s needs effectively. This is a key selling feature of a well-known data catalog, but may well be perpetuating a problem rather than expediting a solution.
- Shadow IT: A less obvious issue – familiar datasets may be shared amongst multiple users, for different purposes, often with the knowledge or control of IT. Or users may create their own data feeds to enrich and improve popular data sets, again creating security risks and hidden costs.
At one client, for example, we were looking for a customer data set including 30 or so critical fields required regulatory reporting. For one business unit, we were forced to accept a smaller data feed, created for another purpose that gave us most of the fields we were looking for, to wait a year for the additional fields we needed. We compromised on what was possible, rather than what was needed, with an impact on the overall delivery.
The Data Marketplace Advantage
A data marketplace offers a more empowering solution. Imagine an open-air market overflowing with datasets – customer behavior data, sensor readings, social media sentiment – all accessible and cataloged. Here’s why it wins:
- Democratization of Data: Data scientists and analysts can explore a wider variety of datasets, leading to richer, more nuanced analysis.
- Data-Driven Discovery: The marketplace fosters a culture of data exploration. Users discover datasets they might not have known existed, unlocking new avenues for investigation.
- Evolving Value: As the organization’s needs change, so too can the most valuable dataset. A marketplace ensures users have access to the data that best suits their current needs.
Building a Data-Driven Oasis
Data warehouses are an invaluable asset, but we shouldn’t let familiarity cloud our judgment. By embracing data marketplaces, we can move beyond the “good enough” and unlock the true potential of our data – a treasure trove waiting to be explored.
Of course, a data marketplace isn’t a magic solution. Implementing one requires robust data governance to ensure quality and security.
But the potential benefits are undeniable.
By breaking free from the illusion of familiarity, organizations can unlock the true value of their data, empowering data science teams to make groundbreaking discoveries.
So, the next time you reach for that well-worn dataset, consider venturing deeper into the data warehouse. You might just stumble upon a hidden treasure trove of information waiting to be explored.

Leave a comment