Database vs Datawarehouse vs Data Lake
If you are a Database Developer or Engineer you must have heard any one of these terms, all of the terms, or maybe none of them.
But being a database developer or data engineer we must be aware of these terms. So…….., for this month writing another article on Databases, Datawarehouse, and Data Lake.
This is one of the important topics if you are preparing for azure certification. So, Let’s understand each one by one.
- Databases are the system that holds the data. And these systems deal with transactional data.
- Follows OLTP (Online Transactional Processing )Processing.
- The database holds structured data.
- OLTP systems will always keep the most recent data rather than historical data.
- The cost to store the data in the database is high.
- Examples of databases are Mysql, PostgreSQL, Oracle, etc…
- Datawarehouse is a system that stores data more than a database.
- Follows OLAP(Online Analytical Processing) Processing where we require a lot of historical data to find the insights.
- The data warehouse also holds structured data.
- OLAP systems will always keep more pf historical data to find insights.
- The cost to store the data is high but less than the database.
- Datawarehouse processes data by ETL or stream /batch processing.
- Examples of Datawarehouse are Teradata, Snowflake, etc…
- Datawarehouse can process data in real-time but on complex queries, day-to-day transactions will become slow.
- Data Lake is the storehouse for the huge amount of structured, semi-structured, and unstructured data.
- The raw form of data.
- Log files can be added directly to the data lake.
- Data Lake is cost-effective.
- Data Lake processes data by ELT or batch processing.
- One of the advantages of a data lake is enough flexibility
- Examples of Data Lake are HDFS, Amazon S3, etc…
Thanks for the read. Do clap if find it 🙂useful.
“Keep learning and keep sharing knowledge✌️.”