Solving the Search & Analytics Challenge on Cloud Storage at Scale
I have been super fortunate to work with incredibly innovative, talented teams that create powerful technology to help manage the world's data. When I met with Thomas Hazel, CHAOSSEARCH founder and CTO, and Les Yetton, CHAOSSEARCH co-founder and CEO to talk about CHAOSSEARCH — scalable, performant text search to your object storage without having to move any data — I knew it was special and was eager to join.
It reminded me of when Andy Palmer told me about Vertica — column storage instead of rows to optimize compression deployed on cost-effective, commodity hardware to replace clunky, row-based analytic systems. Or, when I first met Adam, Alan, and Mike at Cloudant — managed DBaaS that would allow developers to build their mobile and web applications without having to worry about uptime, performance, and availability. The common denominator is great talent coming together to solve real problems in massive markets.
Today, IDC estimates that digital data will grow from 40 Zettabytes in 2019 to 175 Zettabytes in 2025. We can attribute this to the emergence of cloud computing, the internet, and ever-smarter, ubiquitous mobile devices. To meet this ever-growing, dynamic storage challenge, Cloud Object Storage (COS) has emerged as the most popular, cost-effective destination for enterprise data. There are many use cases — data lakes, log data, etc. — that are ideally suited for COS. But, and there is always a but, you can’t effectively search and/or analyze the data in your COS without moving the data first.
As you can imagine, hypergrowth of COS has exposed some of the inherent challenges of subsequent use and consumption of the data that is stored in COS. We are clearly very good at creating an enormous amount of data. However, the systems and processes that are required to process, enrich, and transform that data so that it can be searched introduce involuntary changes in meaning and most importantly security risks. Our current model for information management forces data thru a “pipeline” or “ETL process” and then the data sprawls across the organization which creates a host of challenges.
Enter CHAOSSEARCH. We allow users to keep their data resident within COS and to search it at scale and with incredible performance. With CHAOSSEARCH there is no data movement, transformation, or schema management. (Anyone who has ever built or used a log management system knows how much headache and frustration this eliminates.) The platform streamlines and automates data management in Amazon S3, dynamically and seamlessly discovering, cataloging, and indexing data — regardless of size and type. The process of converting raw data into actionable analytical insights is not only accelerated but completely done for you. And since CHAOSSEARCH is a managed service, all the complexity of rolling your own stack or ETL-ing to a paid service is eliminated.