It’s already difficult to imagine a situation where business makes a progress without data. It plays the main role in companies’ operation on the basis of which business team makes different decisions, develops strategies and forecasts etc. For every improving company it is too necessary to make appropriate relations with their data. And the most important step is data warehouse architecture.
What is data warehouse?
Data warehouse is a storage system for data collected from different sources within a company and used to run decision making process.
Currently there are 2 prominent architecture styles to build data warehouse: the Inmon architecture and the Kimball architecture. Ralph Kimball and Bill Inmon propose different concepts. The main difference is a technique of data structure modeling, loading and storage. This difference has influence on the initial delivery time of the data warehouse and the ability to accommodate ETL design changes. However, methods have general characteristics: both of them position the data warehouse as the central data repository for a company; cover all corporate reporting needs; use ETL to load data warehouse.
Let’s consider each method.
The Inmon approach
This method begins with the corporate data modelling. With the help of this main subject areas and entities (customers, product/service, vendors etc.) are identified. Consequently, on the basis of this a detailed logical model for each entity is created. Entity structure has a normalized form, data redundancy is avoided as much as possible. It is a key characteristic of this technique that allows to determinate business concept and avoid data update anomalies.
The data warehouse is the one source of the truthful information for enterprise. Such structure simplifies data loading. But it is difficult to use structure for querying because of many tables and joins.
So, B. Inmon propose to build data marts for every specific department (Finance, Sales, Business Development etc.). All data is integrated, and data warehouse is the single source of data from different marts. Such concept guarantees data completeness and consistency across an organization.
Advantages:
- data warehouse is the single truthful information source for a company and data source for marts;
- data is integrated;
- data updating anomalies are avoided because of low redundancy that simplifies ETL process and minimizes malfunction probability;
- understandable business processes because of detailed entities description;
- flexibility – it is easy to update data warehouse in response to business needs or basic data changes;
- it is possible to create different reports according to requirements.
Disadvantages:
- model and realization increasing complexity over time because of tables and joins;
- needs highly qualified specialists in both data modeling and business. This task realization is enough difficult and expensive;
- longstanding setting stage and delivery (4 – 9 months);
- needs more ETL works;
- needs a numerous specialist team.
The Kimball approach
This approach begins with main business processes and questions identifying. The operating system is a key data source. For data delivery from different sources and loading it into a staging area is used ETL software. From here data is loaded into a dimensional model. The key approach difference is not normalized dimensional model. The star schema is the main concept of dimensional modeling where is centralized store. The fact table contains all data relevant to the subject area. The dimensional table describes stored data. User can make detailing without additional connections as dimensional tables are totally not normalized. R. Kimball proposes the conformed dimensions concept to achieve an integration in the dimensional model. Key characteristics (customer, product, service) are built once and used by all facts. This guarantees identical characteristic usage by all facts.
Advantages:
- quick setting;
- the star schema is understandable and easy-to-use for business users;
- small size of data warehouse environment that simplifies system management activities;
- high efficiency of the star schema;
- needs small specialist team for data warehouse effective work;
- works well for department-wise metrics and KPI tracking.
Disadvantages:
- there is no the single truthful information source as data is integrated partially;
- there is a probability of data anomalies over time;
- reduction in effectiveness could be instigated by bars adding in the facts table;
- there is no a possibility to satisfy all company requests as it is oriented on business processes.
Both techniques have their advantages and disadvantages and depends on a situation each of them can be more efficient. The main task is to make reasonable and amenable to business needs decision for the best result achievement.