In simpler terms, a data lake is like a real river or lake in its natural state. Just like you have multiple tributaries flowing into a lake, a data lake has all kinds of data flowing through in real-time.
What is Data Warehouse?
Data warehousing is about the collection of data from varied sources for meaningful business insights. An electronic storage of a massive amount of information, it is a blend of technologies that enable the strategic use of data!
Let’s go through a diagram, starting on the left.
Source – https://en.wikipedia.org/wiki/File:Data_warehouse_overview.JPG
You have the operational systems within the organization such as marketing, sales, and so on. You take their information and put it into the staging area.
Now, we need to work out how to get all of this into one logical framework. Right after it goes through the integration layer, it then goes into the data warehouse in a format that is standard across all that data in that data warehouse.
This warehouse will be huge, since we have taken data from across the organization and put it into one large database. However, the director of marketing or the managing director might have a set of questions. Thus, for them we create their own data marts, that will be much smaller than the data warehouse and will give answers quicker.
Now that we have an idea about both data lake and warehouse, let’s compare the two!