#database

Key differences between database, data warehouse, data mart and data lake

In order to make the right decision when choosing a data organization system, it is advisable to conduct a comparative analysis.

Key differences between databases and data warehouse:

Data Warehouse

Database

Key differences between data mart and data warehouse

Data Mart

Data Warehouse

Key differences between data lake and data mart

Data Lake

Data Mart

Key differences between data lake and data warehouse

Data Warehouse

Data Lake

Key differences between databases and data mart

Database

Data Mart

Key differences between databases and data lake

Database

Data Lake

Database, Data Warehouse, Data Mart, Data Lake: main characteristics and differences

Modern organizations process data daily. However, the data may differ in type, scope and manner of use. This must be considered when choosing the best data solution. Achieving results depends, among other things, on the selected enterprise data management system, that must fully meet business needs. It can be data mart, data warehouse, database or data lake.

Database

A database is a place of related data storage that is used to capture a particular situation. For example, a point-of-sale (POS) database. In this case, the database collects, and stores data related to retail store transactions. Data entering the database is processed, systematized, managed, updated, and then stored in tables. The database is the target storage for raw transactional data and performs online transaction processing (OLTP).

The main database characteristics:

Data warehouse

The data warehouse is the main analytical system of the company. It often works in conjunction with an operational data warehouse (ODS) to store data that has been retrieved from various company databases. For example, a company has databases supporting points of sale, online activity, customer and employee information. The data warehouse will take the data from these sources and make it available in one place. The method of extracting data from the database, converting it to ODS, and loading it into the data warehouse is an example of ETL and ELT processes.

The data warehouse is an excellent tool for data analysis due to the capture of transformed historical data. Business departments are involved in organizing data, using it for reporting and data analysis. The data warehouse uses SQL to query data, and use tables, indexes, keys, views, and data types to organize and ensure data integrity.

The main data warehouse features:

Data mart

Like a data warehouse, a data mart maintains and stores processed and ready for analysis data. However, a visibility scope is limited. The data mart provides the subject data that is required to support each business unit. For example, a data mart to support reports and analysis of a marketing department. By defining data boundaries within a particular department, only relevant data is available.

Using a data mart increases security level. Visibility restriction prevents irresponsible use of data that is not relevant to a particular department. It should also be noted that less data in the data mart increases the speed of processing, and therefore increases the speed of query execution. Data is aggregated and prepared for a specific department, minimizing data misuse and the possibility of conflicting reports.

Key data mart features:

Data lake

A data lake is designed to store structured and unstructured company data. It collects all the valuable data for later use: images, pdfs, videos, etc. Just like a data warehouse, a lake extracts data from several disparate sources and processes it. It can also be used for data analysis and reporting purposes. For processing and analysis, different applications and technologies (for example, Java) are used. Data lakes are often used in conjunction with machine learning. Machine learning test results are also stored in data lake. The level of usage complexity requires serious skills from users, as well as experience with programming languages ​​and data processing methods. Data cleaning occurs without ODS usage.

Key data lake features:

The main database security practices

The number of attempts to hack the security system of different companies and organizations is alarming. At the moment, attacks are most of all organizations in the field of health, finance, retail, government, production and energy.

Together with new technologies appearance cybercrime is developing rapidly. Their methods are becoming more sophisticated. As a result, even large enterprises with a reliable cyber protection system can become their victims. Small business is a little «relaxed» in this matter, erroneously believing that they are «uninteresting» for cybercriminals. However, any information and data are value and can become a «prize» for cybercriminals regardless of which company they belong.

According to forecasts, by 2025 cybercrime will cost the global economy of 10.5 trillion dollars. This once again shows how important direct the attention to ensure cybersecurity.

Safety measures database differ from web security measures. Next, consider 10 basic methods for ensuring databases safety and corporate information protection.

  1. Physical database security

Data centers and proprietary servers may be vulnerable to physical attacks from a third party or internal source. A cybercriminal can steal data, corrupt it, or inject malware to gain remote access having gained access to a physical database server. They can bypass digital security protocols, so it’s worth taking extra security measures to detect this type of attack.

When choosing an information hosting and storage service provider it’s necessary to make sure that the company takes security issues seriously. It is worth avoiding free services, as this may lack a security system. To ensure the security of your own servers, you need to introduce additional physical security measures: cameras, locks, security personnel. Also, to reduce the risk of unauthorized activity, ertain users must have registered access to the servers.

  1. Separate database servers

Protecting databases from cyberattacks involves special security measures. Placing the data and the site on the same server exposes the data to attacks that target the site. For example, the online store owner stores the website, confidential and non-confidential data on the same server. To protect against cyberattacks and fraud, many use the site security system that is provided by the hosting, as well as the security features of the e-commerce platform. But the vulnerability level of sensitive data to attacks through the website and e-commerce platform is becoming much higher. As a result, a cybercriminal can gain access to the database.

To mitigate these risks, it’s necessary to separate database servers from everything else. It also makes sense to use security information and monitor events in real time. It allows organizations to respond quickly and take immediate action when a breach is attempted.

  1. HTTPS server setup

The proxy server acts as an intermediary between the user and the target server. Before accessing the database server, it evaluates requests that are sent from the workstation and does not allow unauthorized requests. Data passed through the proxy server is also encrypted, providing an additional protection layer. Sensitive data such as passwords, payment information, personal information requires setting up an HTTPS server.

  1. Don’t Use Default Network Ports

Protocols are used when transferring data between servers. TCP and UPD protocols are used transferring data between servers and automatically use the default network ports. The default port is often used in a brute-force attack. The attack consists in searching for a password from the set of all its possible values ​​by exhaustive search. If you do not use the default ports, the cybercriminal will have a long and possibly unsuccessful path to find the right key. To ensure that the new port isn’t being used by others, it’s necessary to check the Internet Assigned Numbers Authority registry when assigning the new port.

  1. Real-time database monitoring

Regularly scanning the database for hacking attempts enhances security and also allows to quickly respond to potential attacks. Tripwire software can be used to log all activities that occur on the database server.

Also, regular audits and testing should be carried out. It allows timely detection of vulnerabilities in database security and fix them.

  1. Database and application firewall

A firewall is the first protection level against unauthorized access attempts that must be installed, both to protect the site and the database.

In this case, 3 types of firewalls are commonly used:

  1. Packet filter firewall
  2. Stateful packet inspection (SPI)
  3. Proxy server firewall
  4. Data Encryption Protocols

Data encryption is necessary to preserve trade secrets, as well as when moving and storing user confidential information. Data encryption significantly reduces the possibility of a successful data breach. Even if a cybercriminal gets hold of the data, the information remains secure.

  1. Create backups

To reduce the risk of losing sensitive information due to malicious attacks or data corruption, you should back up a database regularly. The copy must be encrypted and stored on a separate server. This approach allows to recover data in case the primary database server is compromised or unavailable.

  1. Application update

As a result of research, it was revealed that 9 out of 10 applications contain outdated software components. According to the WordPress plugins analysis, 17383 plugins have not been updated for 2 years, 13655 for 3 years, and 3990 for 7 years. Together, this poses a serious security risk. To manage databases, it’s necessary to use reliable software, keep it up to date and install new patches, and this also applies to widgets, plugins, third-party applications, etc.

  1. User authentication

According to studies, compromised passwords are responsible for 80% of data breaches. This proves that passwords by themselves are not a strong security measure (primarily due to the human factor in creating a password). To solve this problem, it is worth adding another security layer by setting up a multi-factor authentication process. Recent trends make this method less than ideal, but it will be difficult for cybercriminals to bypass the security protocol. Also, to reduce the potential hacking risk, access to the database should be limited to verified IP addresses. The IP address can be copied, but it will require additional effort from the cybercriminal.

Database Security

A database is a structure for storing, modifying and processing a large amount of interdependent information. Large data amounts storing in a single database makes it possible to form many variations of information grouping: personal data, customer data, corporate data, order history, product catalog, etc. Undoubtedly, one of the main requirements for databases is security.

Database security is a set of measures that are used to protect database management systems from cyberattacks and unauthorized use, as well as to create and maintain their confidentiality, integrity and availability. Database security programs are designed to protect against unauthorized use, damage and intrusion of the data in the database, the entire data management system and each application.

Database security protection includes:

Database security is a complex and voluminous project that includes all aspects of information security technologies and practices. Database availability and usefulness can add vulnerabilities to cyberattacks.

Data leakage is nothing more than the failure to ensure data confidentiality in the database. The degree of damage to the enterprise will depend on the following factors:

  1. Compromised intellectual property

The intellectual property of an organization is a trade secret, various kinds of inventions, property rights. All of these are critical to the ability to own a business and maintain a competitive edge in the marketplace. Intellectual property theft can make recovery difficult or impossible.

  1. Damage to reputation

The trust of customers and partners is very valuable. They need to know and feel the level of their data protection. Otherwise, it threatens with a refusal to purchase goods or services, a refusal to cooperate.

  1. Business continuity

Some companies cannot continue their activities until the problem is fully resolved.

  1. Penalties for non-compliance

Financial penalties can be devastating to a business. In some cases, fines exceed several million dollars.

  1. Repairing breaches and notifying customers costs

In addition to the costs of communication with the client, the affected company must organize and pay for judicial and investigative activities, crisis management activities, recovery, etc.

Incorrect settings, vulnerabilities, misuse of software can lead to serious violations. The most common causes and threats to database security include:

Database as a service

The active growth of the database sector was fueled by the emergence of Covid-19. To survive in the new environment, many companies have had to revise and improve their applications and digital services. As a result, more database instances were deployed, as well as tools to ensure quality data manipulation. Since no business plans to reduce data usage, there will be new products for data processing and ways to create value.

One of the main trends of 2022 is related to the company’s databases management. Not so long ago, all database instances had to be run inside a local data center. For now, developers and IT teams have more options. Together with an on-premises deployment, databases can be run by a service provider, deployed in the cloud, used as a service from a cloud provider, and run as serverless instances.

Database-as-a-Service (DBaaS)

Database-as-a-service supports more applications and digital services, so more companies will be migrating to this option in 2022. DBaaS is a cloud computing model that provides access to a database without installing hardware or software. Configuration and maintenance are the provider’s tasks, the user can immediately use the database. DBaaS is a fully managed service that includes software and hardware, backup, administration, networking, and security. The implementation of this tool ensures a fast, safe and cost-effective workflow that allows to optimize all business processes.

Database-as-a-service benefits:

Database-as-a-service disadvantages:

The main DBaaS disadvantage the lack of direct access to corporate information: the storage and processing of databases is transferred to a third party, which makes it impossible to influence security and recovery measures. However, the level of reliability, security and efficiency of DBaaS far exceeds standard databases.

The need for enterprises for stand-alone cloud databases is growing today. Cost reduction, availability, renewal, flexibility, efficiency are the main factors that drive companies to use DBaaS. The global cloud database and DBaaS market is projected to reach $ 399.5 billion by 2027.

DBA: proactivity or “fighting fires”?

One of the most popular requirements for a specialist and it means one of the most valuable characteristics is proactivity. A proactive employee is characterized by the independence, a responsible approach, the ability to influence particular situations, and the ability to calculate in advance possible events outcomes. So, a proactive specialist doesn’t «put out the fire» but tries his best to prevent its appearing.

Is it necessary for DBA to be a proactive specialist? The answer is a definite – YES. The proactive approach for a databases performance and maintenance helps DBA to avoid problems and prevent small bugs transformation into a full blown disaster.

Below we’ll talk about DBAs’ routine work and some recommendations for database performance improving.

Database maintenance

Every administrator knows that it is necessary to make a backup. But not all of them take this task responsibly and pay due attention to it. Making backup regularly many specialists forget to take into account the most important thing – consequences. It is necessary to remember that regular backup also determines a potential data loss as a result of the recovery (data that was created after a backup but before a recovery).

Backup

No executive is willing to lose data. That’s why it is too important to define a metric like RPO (recovery point objective) before a backup planning and database recovery.

RPO is a allowed time period over which data can be lost in case of failures. Time of the data recovery from a backup storage doesn’t have to exceed this metric. RPO has to dictate when (how often) and how (with the help of what technologies) it is necessary to make backups.

Ensuring successful backup is one of the most important DBA’s tasks. In the process of a backup creating administrator should use the CHECKSUM parameter, with the help of which it is possible to check defects. Another one variant to check a backup is the RESTORE VERIFYONLY function.

DBA can make different types of backups: full (all data), differential (only data that has been changed since the moment of the last full backup), incremental (only changed data since the previous backup).

Some administrators find it necessary to restore differential backups in chronological order to reach RPO, but it is wrong opinion. It is necessary to restore only the last full backup before the recovery point followed by the most recent differential backup.

Recovery

DBA has to be ready in advance for possible DB failures in which he needs to recover backups. Being in such situation the first manager’s question will be about recovery time. That’s why administrator should know how much time it takes to retrieve the backup files, to recover backups and give a manager the most accurate process estimation. For this it is necessary to test backups and possible recovery scenarios in advance.

Integrity check

Database integrity check is also an important part of DB maintenance. Some DB servers allow to back up corrupt databases, but without backup restore. In this case DBA can perform a full check for database corruptions with the help of DBCC CHECKDB before a backup. The main disadvantage of this function is resource intensive – the work with database could take days.

Agent Alerts

For quickly respond to database failures DBA should set up alerts.

Microsoft offers to solve this task by using The SQL Server Agent designed to notify about SQL server errors with a severity between 17 and 25 which include database engine and resource errors and 823, 824, 825, 829 errors.

PostgreSQL offers its products: pgAnalyze и pgwatch2. PgAnalyze is a software designed to improve the queries visibility. This tool can be used to define reasons of the slow queries performance and to monitor databases for receiving a current state imagine. pgwatch2 is a flexible monitoring solution using Grafana dashboards.

Indexing

A proper indexing is one of the best ways to improve database performance. Correctly created indexes promote quickly information finding and user queries performance. For the proper index building DBA has to own expertise in this field otherwise incorrect index can have the opposite effect and essentially reduce an inquiry processing.

GoUp Chat