The bus approach does not duplicate or store all master data. Typically, these solutions maintain an index which allow them to exchange data with the various consumer systems. Business rules may, however, be used to build a consolidated record made up of the best attributes from the various linked records.
As can be seen from the graphics, certain key elements, such as ETL and Data Quality services are necessary in either approach, although these may be implemented differently. Of course, in most real world environments a hybrid model is more likely, comprising of certain aspects of each extreme approach.
So, when should you use which. The reality is that each approach has strengths and weaknesses.
In the Hub approach data can be standardised and distributed to replace conflicting data in the subscriber systems. In theory a single record will be kept making it easier to manage duplicates etc – assuming a decent data quality solution is built in to the design. However, in many cases it is impossible to update or overwrite existing systems – for example when these are externally maintained. Alternatively, it can be difficult to define and agree changes to data standards in common use in certain systems or environments – a political and social change issue that is a strong driver for data governance in MDM projects.
The Bus approach is more complex as no repository of master records will be created. Instead, the bus will maintain links between all the records that exist (in one or many applications) and will not physically update any. The approach is far less intrusive (data is not necessarily changed) but requires the understanding that the master record is in fact a distributed entity (and may be duplicated or have conflicting value for key attributes.) So any use of the data has to take this complexity into account. Once again, data governance and data quality are integral to the successful management of this complexity.
So simplistically, the repository approach is better suited to data which is used in a consistent manner and where the impact of standardising and updating the core data will be acceptable. The bus approach can be easier to deploy in a more complex environment or where data cannot be easily standardised to due differing uses or due to factors out of the team’s control.
In either case, data governance and stewardship is critical to define and agree data standards and will help you to assess which approach is most feasible in your environment – a hub may be viable for some data and a bus may be required for the rest.
Data Quality processes and tools assist with defining and enforcing data standards between the disparate systems, and with identifying duplicates records for consolidation.