Database, Data Warehouse, Data Mart, Data Lake: main characteristics and differences
Modern organizations process data daily. However, the data may differ in type, scope and manner of use. This must be considered when choosing the best...
Modern organizations process data daily. However, the data may differ in type, scope and manner of use. This must be considered when choosing the best data solution. Achieving results depends, among other things, on the selected enterprise data management system, that must fully meet business needs. It can be data mart, data warehouse, database or data lake.
A database is a place of related data storage that is used to capture a particular situation. For example, a point-of-sale (POS) database. In this case, the database collects, and stores data related to retail store transactions. Data entering the database is processed, systematized, managed, updated, and then stored in tables. The database is the target storage for raw transactional data and performs online transaction processing (OLTP).
The main database characteristics:
The data warehouse is the main analytical system of the company. It often works in conjunction with an operational data warehouse (ODS) to store data that has been retrieved from various company databases. For example, a company has databases supporting points of sale, online activity, customer and employee information. The data warehouse will take the data from these sources and make it available in one place. The method of extracting data from the database, converting it to ODS, and loading it into the data warehouse is an example of ETL and ELT processes.
The data warehouse is an excellent tool for data analysis due to the capture of transformed historical data. Business departments are involved in organizing data, using it for reporting and data analysis. The data warehouse uses SQL to query data, and use tables, indexes, keys, views, and data types to organize and ensure data integrity.
The main data warehouse features:
Like a data warehouse, a data mart maintains and stores processed and ready for analysis data. However, a visibility scope is limited. The data mart provides the subject data that is required to support each business unit. For example, a data mart to support reports and analysis of a marketing department. By defining data boundaries within a particular department, only relevant data is available.
Using a data mart increases security level. Visibility restriction prevents irresponsible use of data that is not relevant to a particular department. It should also be noted that less data in the data mart increases the speed of processing, and therefore increases the speed of query execution. Data is aggregated and prepared for a specific department, minimizing data misuse and the possibility of conflicting reports.
Key data mart features:
A data lake is designed to store structured and unstructured company data. It collects all the valuable data for later use: images, pdfs, videos, etc. Just like a data warehouse, a lake extracts data from several disparate sources and processes it. It can also be used for data analysis and reporting purposes. For processing and analysis, different applications and technologies (for example, Java) are used. Data lakes are often used in conjunction with machine learning. Machine learning test results are also stored in data lake. The level of usage complexity requires serious skills from users, as well as experience with programming languages and data processing methods. Data cleaning occurs without ODS usage.
Key data lake features: