In order to make the right decision when choosing a data organization system, it is advisable to conduct a comparative analysis.
Key differences between databases and data warehouse:
Data Warehouse
- stores summary data;
- used for data analysis;
- storage of historical and current data;
- information from various sources providing;
- providing of information on general business operations;
Database
- uses detailed data;
- fixation transactions;
- storage of current data;
- collection of data from one source;
- fixation the main day-to-day operations;
Key differences between data mart and data warehouse
Data Mart
- providing of a thematic data subset that was retrieved from the data warehouse (usually less than 100 GB in size);
- is a repository of valuable data for a specific subgroup;
- fast data analysis;
- getting data from the data warehouse;
Data Warehouse
- significantly larger (terabyte or more);
- contains all cleaned data for business units;
- getting data from databases;
Key differences between data lake and data mart
Data Lake
- contains all raw and unfiltered organization data;
- expedient to use for wider and deeper analysis of raw data;
- a complete solution that acts as a data warehouse, database and data mart;
- availability of a central archive where data marts can be stored in different user areas;
Data Mart
- contains filtered and structured data for a specific department;
- allows to quickly and efficiently analyze relevant information;
- is a one-time solution without ETL process;
Key differences between data lake and data warehouse
Data Warehouse
- storage of cleaned data to create structured data models and reports;
- use of ODS from transactional systems;
- intended for users who need to create reports for analytics;
Data Lake
- storage of all data for the organization;
- use of hardware that makes it possible to economically store large amounts of data (terabytes, petabytes);
- extracting data from all data types, including non-traditional data types (web service logs, social media activity, sensor data, etc.);
- designed for deep analysis that goes beyond data scope that is stored in the repository;
Key differences between databases and data mart
Database
- is a transactional data repository (OLTP);
- fixation of all aspects and activities of one subject in particular;
- contains raw data;
- users do not interact with data in databases;
- is the first step in the ETL process;
Data Mart
- is a warehouse of analytical data (OLAP);
- contains data from several subjects;
- contains processed and verified data that simplifies the process of creating reports;
- direct user interaction with data from data marts;
- is the last step in the ETL process
Key differences between databases and data lake
Database
- fixation of transactional data that are related to one topic;
- stores traditional data (text, numbers);
- does not perform data cleaning, stores raw data;
- exports its data to another process (operational data storage);
- is the first step in the ETL process;
Data Lake
- recording the activity of many databases and other disparate data sources;
- it is possible to store data of any type (pdf-files, images, sound files, etc.);
- stores raw data, however, a data cleansing procedure is implemented;
- performs all data processing (cleansing and aggregation);
- handles all aspects of the ETL process.