IT leaders’ experiences with data lakes have been a roller coaster ride since their inception in 2010. To some, that roller coaster ride might resemble the canonical Hype Cycle graphic, trademarked by Gartner to show the maturity curve of technologies in a given category over time.
What have been the game-changing factors allowing data lakes to persist past the plateau? Is it possible to optimize such a vast and seemingly chaotic storage platform variety? Read on for our take on how we’ve gotten here, or download your copy of the latest Hype Cycle for Data Management here.
The First Data Lakes Failed
The first data lakes were built on Apache Hadoop, which allowed users to store unstructured and multi-structured datasets at scale, and run application workloads on clusters of on-premise commodity hardware.
This initial concept sounded great in theory — enter the “Innovation Trigger” stage of the Hype Cycle — but struggled to deliver real value in practice.
Organizations were dumping large volumes of raw data into their lake but struggled to give end users access to the datasets they needed, or to integrate known front-end tools. Without any organization, governance, or integration with known ETL or analytics tools, the data lake often became a “data swamp” where data would sit stagnant because users didn’t know how to effectively access or glean insights from it.
The Data Lake Comeback
All the while, the macro problem has persisted: how to access and analyze multiple data types without constraints inherent to storage and infrastructure.
As a result, there’s been tremendous innovation around solving this problem since those first implementations built on Hadoop. To activate the data lake and deliver on its promise, modern approaches are driving this market into maturity.
Today, data lakes are built on cloud object storage and can be activated to support multi-dimensional analytics use cases such as full text search, relational queries, and machine learning. Data lakes complement data warehouses with an open philosophy, offering schema-on-read, loosely coupled storage/compute and flexible use cases that combine to drive innovation by reducing the time, cost, and complexity of data management.
As a result, data lakes are seeing renewed excitement and successful implementations. And as the Hype Cycle graphic shows, they’re getting ready to enter a new phase of maturity in the next year: the Slope of Enlightenment.
Data Lakes Are Here to Stay
We believe, according to the analysis by Phillip Russom and Henry Cook in the Gartner Hype Cycle for Data Management, 2021, the top drivers for success with data lakes are:
- Organizations continue to be driven by data and analytics
- There is an increasing demand for the expansion of analytics programs
- Data exploration has become a common practice
- Data warehouses will continue to be relevant, but only when modernized
That being said, challenges will continue to present themselves. Best practices are still changing and developing. If your organization is implementing a data lake today, make sure it follows modern standards.
As the Gartner report describes, “The first data lakes were built on Hadoop, for data science only, and they lacked metadata, relational functionality and governance. If you build that kind of data lake today, it will fail. Today’s data lake is on cloud, and it supports multiple analytics techniques (not just data science).”
It further adds, “A data lake, when designed properly, can provision data for the diverse exploration requirements of multiple user types and use cases.”
Looking Beyond the Lake
The 2021 Gartner Hype Cycle for Data Management plots more than 30 different categories of data management technologies — including data lakes, data lakehouses, multi-model DBMS, and logical data warehouses — based on their business benefit and years to mainstream adoption and maturity.
Download the full report, Gartner Hype Cycle for Data Management, 2021.
At ChaosSearch, our goal is to help customers prepare for the future state of enterprise data management by bridging the gap between data lakes and data warehouses. ChaosSearch helps modern organizations Know Better™ by activating the data lake for analytics. The ChaosSearch Data Lake Platform indexes customers’ cloud data, rendering it fully searchable and enabling analytics at scale with massive reductions of time, cost and complexity.
ChaosSearch was purpose-built for cost-effective, highly scalable analytics encompassing full text search, SQL and machine learning capabilities in one unified offering. The patented ChaosSearch technology instantly transforms your cloud object storage (Amazon S3, Google Cloud Storage) into a hot, analytical data lake.
- Read the blog: Data Lake vs. Data Warehouse
- Read the blog: The Multi-model Database Dilemma
- Read the blog: Think You Need a Data Lakehouse?
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from ChaosSearch.
Gartner and Hype Cycle are registered trademarks of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.