The ever-increasing focus on data is driving demand for data management tools including, separately, data catalogues and data modelling tools.
This post attempts to define the difference between the two and help you to decide which you need.
What is data modelling?
A data model is an abstract representation of the real-world objects that interoperate within your business. It represents both objects – such as customer, account and transaction – and the relationships between them – and ultimately defines how these objects and relationships will be represented and stored within a database.
A data modeller engages business users to understand the business requirement, which is documented as a data model. Typically, the design process will go through three phases:
- Conceptual: A conceptual data model shows a high-level view of the data an enterprise uses to support business processes. This model is typically not associated with specific systems or databases but acts as a reference showing high level data storage and processing requirements for the business.
- Logical: A logical data model captures details of the attributes that make up each object. For example, a customer may have a name, address, telephone number, and so on.
- Physical:Physical data models document the technical details specific to the intended database management system – for example data types and lengths.
Data modelling tools
A data modelling tool supports a top-down approach to database design as outlined above. In this case, the data modellers will deliver a conceptual, logical and physical data model which will be provided to the developers to implement in a database. Some tools may also automatically generate code and data structures from the graphical design.
Data modelling tools help the data modeller to manage the complexity of modelling a typical enterprise data landscape, keep a graphical record of the design, and provide a template to the developers for implementation. The users tend to be technical in nature, although models can be useful tools to communicate with business stakeholders.
Data catalogues
A data catalogue, by comparison, supports a bottom-up approach to database design. Data catalogues leverage algorithms to discover actual data models and present them to the data modeller and the business.
Humans can then add business context, such as the business systems, processes and reports that use or populate the data set. Where data models did not previously exist, this automated approach can save tremendous time and effort when documenting a data landscape.
Data catalogues help to make data visible within and across the enterprise, both by providing a physical location and by providing the business context necessary for data scientists, analysts and decision-makers to find the best set of data for a particular requirement.
Data catalogues present a single source of truth for all things data, automate stewardship workflows and provide audit trails for decision making.
So which is needed?
In practice, most organisations will need both a data modelling and a data cataloguing capability. Ideally, the two will communicate with each other – the data catalogue updating the physical data models within the modelling tool, and the modelling tool linking conceptual and logical design changes into the catalogue and making them accessible to business stakeholders.