I have Power BI, do I need a data warehouse?
By Kent Teague, Managing Consultant, Melbourne
With the rapid spread of data visualisation tools like Power BI and Tableau, often outside of IT, Altis is commonly asked: is the data warehouse dead? Can’t I just grab the data and create my own reports? This article will talk to the continued importance of understanding the business context of data, and a well-proven approach to structuring the data in a flexible, intuitive and performant manner.
Best Practice Dimensional Modelling
Using Dimensional Modelling to structure your data is still considered to be the best approach for the reporting/user access layer of your data warehouse. Data will be stored in a format (Star Schemas) that are performant, easy to maintain, and simple for end users to consume. Dimensional models are described by Ralph Kimball as a means to “process measurement events, dividing data into either measurements or the “who, what, where, when, why, and how” descriptive context”.
Before jumping in and designing a Dimensional model, it is critical that the needs of the business and the realities/quality of the source data are clearly understood. This is achieved through a series of interactive workshops with business representatives to understand their key business drivers, decision making processes, and analytical requirements.
Once these key drivers have been identified and agreed upon then the design of the dimensional model can commence. It is important that business representatives are active participants throughout this phase to ensure they are aware of all key decisions reached.
The Kimball University describes four key decisions to be made during design:
1. Select the business process
- Selecting the business process is the first step in the design process, as it deﬁnes a speciﬁc design target and allows the grain, dimensions, and facts to be declared.
- Each business process corresponds to a row in the enterprise data warehouse bus matrix.
2. Declare the grain
- Declaring the grain is a critical component of a robust design as it defines what each row in the fact table represents.
- Different grains must not be mixed in the same fact table. E.g. Separate monthly and daily into two separate Fact tables.
3. Identify the dimensions
- Define the descriptive attributes that give context to a business process.
- Dimensions are the drivers of the user’s BI experience as they provide the “who, what, where, when, why, and how” context surrounding a business process event.
4. Identify the Facts
- Identify the relevant measurements that correspond to a business process event.
- A fact table must correspond to a physical observable event, and not to the demands of an individual report.
Bus Matrix – a visual representation of relationships
A key output of this design phase is the creation of the Bus Matrix. The Bus matrix provides a visual representation of the (high level) relationships between the business processes and the descriptive attributes (dimensions) that describe it. (See image below).
The Bus Matrix also plays a key role in the design, performance, and usability of BI reporting models such as SSAS Tabular or Power BI through conformed dimension (dimensions across more than one business process). The aim is to have as many conformed dimensions as possible. Ralph Kimball describes this as the “single version of the truth” view of the “who, what, where, when, why, and how” of events for end-users.
The Dimensional model is also considered one of the best approaches when designing Tabular or Power BI models. These models are highly performant and this performance increases as the model size decreases. The reverse is also true, as the model increases in size the performance decreases. In theory you could get the best performance from your model by flattening all your data into one table, this would of course lead to a poor end user experience amongst many other issues.
Key Benefits of Star Schemas
The key benefits of using a Star Schema design in your BI model are improving performance and streamlining downstream reports by:
- Decreasing the amount of redundant data stored
- Reducing the number of data refresh operations
- Ensuring consistent (single source of truth) reporting across all areas of an organisation
To wrap up, Data visualisation tools are a great way to engage users. Better yet is a visualisation tool accessing a comprehensive, accurate, accessible and fast data platform to allow users to explore and unlock valuable insights to drive behaviour.