{"id":44361,"date":"2022-11-23T18:39:18","date_gmt":"2022-11-23T15:39:18","guid":{"rendered":"http:\/\/datalabsua.com\/ua\/?p=44361"},"modified":"2024-05-22T17:08:26","modified_gmt":"2024-05-22T14:08:26","slug":"database-data-warehouse-data-mart-data-lake-main-characteristics-and-differences","status":"publish","type":"post","link":"https:\/\/datalabsua.com\/en\/database-data-warehouse-data-mart-data-lake-main-characteristics-and-differences\/","title":{"rendered":"Database, Data Warehouse, Data Mart, Data Lake: main characteristics and differences"},"content":{"rendered":"<p>Modern organizations process data daily. However, the data may differ in type, scope and manner of use. This must be considered when choosing the best data solution. Achieving results depends, among other things, on the selected enterprise data management system, that must fully meet business needs. It can be data mart, data warehouse, database or data lake.<\/p>\n<p><strong>Database<\/strong><\/p>\n<p>A database is a place of related data storage that is used to capture a particular situation. For example, a point-of-sale (POS) database. In this case, the database collects, and stores data related to retail store transactions. Data entering the database is processed, systematized, managed, updated, and then stored in tables. The database is the target storage for raw transactional data and performs online transaction processing (OLTP).<\/p>\n<p><em>The main database characteristics:<\/em><\/p>\n<ul>\n<li>structuring according to particular company operations and applications;<\/li>\n<li>availability of strict rules for data storage and organization (RDBMS);<\/li>\n<li>flexible data storage (NoSQL);<\/li>\n<li>single-purpose in its nature;<\/li>\n<li>use for online transaction processing (OLTP);<\/li>\n<li>recording data, capturing transactions as they occur, and placing them.<\/li>\n<\/ul>\n<p><strong>Data warehouse<\/strong><\/p>\n<p>The data warehouse is the main analytical system of the company. It often works in conjunction with an operational data warehouse (ODS) to store data that has been retrieved from various company databases. For example, a company has databases supporting points of sale, online activity, customer and employee information. The data warehouse will take the data from these sources and make it available in one place. The method of extracting data from the database, converting it to ODS, and loading it into the data warehouse is an example of ETL and ELT processes.<\/p>\n<p>The data warehouse is an excellent tool for data analysis due to the capture of transformed historical data. Business departments are involved in organizing data, using it for reporting and data analysis. The data warehouse uses SQL to query data, and use tables, indexes, keys, views, and data types to organize and ensure data integrity.<\/p>\n<p><em>The main data warehouse features:<\/em><\/p>\n<ul>\n<li>large amount of historical data storage, old data is not deleted when new ones are updated;<\/li>\n<li>capturing data from several disparate databases;<\/li>\n<li>works with ODS to store organize and clean data;<\/li>\n<li>OLAP-application (analytical online processing);<\/li>\n<li>main data source for data analysis;<\/li>\n<li>reports and dashboards use data from data warehouse.<\/li>\n<\/ul>\n<p><strong>Data mart<\/strong><\/p>\n<p>Like a data warehouse, a data mart maintains and stores processed and ready for analysis data. However, a visibility scope is limited. The data mart provides the subject data that is required to support each business unit. For example, a data mart to support reports and analysis of a marketing department. By defining data boundaries within a particular department, only relevant data is available.<\/p>\n<p>Using a data mart increases security level. Visibility restriction prevents irresponsible use of data that is not relevant to a particular department. It should also be noted that less data in the data mart increases the speed of processing, and therefore increases the speed of query execution. Data is aggregated and prepared for a specific department, minimizing data misuse and the possibility of conflicting reports.<\/p>\n<p>Key data mart features:<\/p>\n<ul>\n<li>focus on one subject or business unit;<\/li>\n<li>mini-repository of aggregated data;<\/li>\n<li>the amount of data is limited;<\/li>\n<li>reports and dashboards use data from the data mart.<\/li>\n<\/ul>\n<p><strong>Data lake<\/strong><\/p>\n<p>A data lake is designed to store structured and unstructured company data. It collects all the valuable data for later use: images, pdfs, videos, etc. Just like a data warehouse, a lake extracts data from several disparate sources and processes it. It can also be used for data analysis and reporting purposes. For processing and analysis, different applications and technologies (for example, Java) are used. Data lakes are often used in conjunction with machine learning. Machine learning test results are also stored in data lake. The level of usage complexity requires serious skills from users, as well as experience with programming languages \u200b\u200band data processing methods. Data cleaning occurs without ODS usage.<\/p>\n<p><em>Key data lake features:<\/em><\/p>\n<ul>\n<li>collection of all data from many disparate data sources over a long period of time;<\/li>\n<li>meeting the needs of different users within the company;<\/li>\n<li>data processing and cleaning, saving them to the data lake.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Modern organizations process data daily. However, the data may differ in type, scope and manner of use. This must be considered when choosing the best data solution. Achieving results depends, among other things, on the selected enterprise data management system, that must fully meet business needs. It can be data mart, data warehouse, database or data lake.<\/p>\n","protected":false},"author":2,"featured_media":44846,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[97,85,159,86],"class_list":["post-44361","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","tag-database","tag-datalake","tag-datamart","tag-datawarehouse"],"_links":{"self":[{"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/posts\/44361","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/comments?post=44361"}],"version-history":[{"count":3,"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/posts\/44361\/revisions"}],"predecessor-version":[{"id":44365,"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/posts\/44361\/revisions\/44365"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/media\/44846"}],"wp:attachment":[{"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/media?parent=44361"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/categories?post=44361"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datalabsua.com\/en\/wp-json\/wp\/v2\/tags?post=44361"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}