What is a data lake?
A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. Put in the context of Big Data, a data lake allows you to store an unlimited “variety” of information at scale “volume” with a high ingest (“velocity”) rate. You can store your data as-is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.
Key features of a CHAOSSEARCH enabled data lake on Amazon S3:
- Supports structured and unstructured data
- Schema on read
- Backed by low cost Amazon S3 storage
- Provides agile search analytics and decision making
- Fast, reliable queries over terabytes of data
- No data movement
CHAOSSEARCH publishes Elasticsearch APIs, but we do not run any Elasticsearch software under the hood. CHAOSSEARCH is not an overlay or an add-on to Elasticsearch, but rather a full replacement of it. Only the API is the same. One huge benefit is that existing Elasticsearch users do not have to port their implementations. In other words you get all of the power of CHAOSSEARCH without any heavy lifting.
Data lake challenges
The differentiating challenge of a data lake is the large variety of information it contains. It’s hard to analyze data when you don’t know what it is. So, typically the data to be queried and analyzed must first be identified and then it must be normalized. Both of these processes are time consuming and labor intensive.
Streamline data discovery and management
Dump as much data as you want into your Amazon S3 infrastructure. Let CHAOSSEARCH discover and organize it for you — regardless of size and type. We automate your data cataloging and indexing process without ever moving your data.
Understand what’s in your data
If you’re paying to store data, you should be able to use that data. However, understanding what’s in your Amazon S3 buckets is painful. We organize the data you catalog and create virtual folders (or buckets) that further define and filter the files within those buckets. CHAOSSEARCH gives you the keys to unlock your data.
Centralize, organize, and query terabytes of data
Automatically collect, organize, and standardize your log and event data for easy and cost-effective storage, retrieval, and analysis. Don’t worry about the cost or performance of your growing data. CHAOSSEARCH will manage it all at a cost that won’t break your budget. Automatically discover and normalize log data streamed into Amazon S3 from Logstash, AWS, or your application and make it available to search or query from the Elasticsearch API or Kibana. Stop spending hours collecting, organizing, and formatting log and event data from multiple locations. Simply stream your data into Amazon S3 and let CHAOSSEARCH do the rest — never move it again.
A data lake in a box
The differentiating challenge of a data lake is the large variety of information it contains. It’d hard to analyze data when you don’t know what it is. So, typically the data to be queried and analyzed must first be identified and then it must be normalized. Both of these processes are time consuming and labor intensive.
The CHAOSSEARCH platform is a complete data lake in box SaaS offering, collapsing single purpose data and analytics services like Amazon Redshift, Amazon Athena, AWS Glue, Amazon EMR, Amazon QuickSight and Amazon SageMaker into a single cost-effective solution.
You no longer have to limit your data lake to a simple storage bucket. Instead, CHAOSSEARCH enables you to discover, manage, and analyze your highly variable information without moving it. CHAOSSEARCH dramatically increases the value you and your team can gain from your data lake.