Modern, data-driven enterprise SecOps teams use Security Information and Event Management (SIEM) software solutions to aggregate security logs, detect anomalies, hunt for security threats, and enable rapid response to security incidents.
SIEMs enable accurate, near real-time detection of security threats, but today's SIEM solutions were never designed to handle the large amounts of security log data generated by modern organizations on a daily basis.
As daily log ingestion grows, the cost of ingesting, processing, analyzing, and storing/retaining security logs in the SIEM increases exponentially. SIEM query performance can also deteriorate, delaying critical insights and increasing the organization’s vulnerability to cyber attacks.
SecOps teams may try to salvage SIEM performance by reducing the data retention window for security logs, but doing so negatively impacts long-term use cases for security data and can further damage the organization's security posture.
Thankfully, there’s another way to store and manage security data that can help Enterprise SecOps teams overcome these challenges, augment or replace existing SIEM Solutions, and bolster their security posture: a Security Data Lake.
In this blog, we’ll explain the unique features and benefits of security data lakes, how a security data lake can complement your existing SIEM solution, and practical guidance for getting started with a Security Data Lake.
A Security Data Lake is a centralized repository for aggregating, storing, and analyzing enterprise security data. Security Data Lakes leverage the inherent advantages of data lake architecture to deliver cost-effective data storage and log analytics for SecOps teams at scale, including:
The unique characteristics of data lake architecture make Security Data Lakes an excellent tool for cost-effectively storing and retaining security data at scale. Enterprise SecOps teams can continue using a trusted SIEM tool for near real-time anomaly detection and threat hunting, while the addition of a Security Data Lake allows for cost-effective long-term retention of security data and enables long-term security log analytics use cases like advanced persistent threat (APT) detection and root cause analysis.
Read: Integrating Observability into Your Security Data Lake Workflows
Getting started with a security data lake involves choosing the right data lake architecture and software components based on your organization’s unique needs, capabilities, and circumstances.
Below, we share some of the key decision points you’ll encounter and our best practical advice for getting started with a security data lake.
Modern security data lakes are deployed in the cloud, as public cloud infrastructure offers the most durable, scalable, and cost-effective storage backing for your security data lake. Enterprise SecOps teams can choose from data lake solutions offered by public cloud providers (e.g. AWS Data Lake, Amazon Security Lake, Google Data Lake), or by 3rd-party SaaS vendors like Snowflake or ChaosSearch.
Amazon Security Lake Reference Architecture
When choosing a data lake solution, enterprise SecOps teams should compare solutions in terms of overall complexity and management overhead, total cost of ownership (TCO), and ease of integration with existing systems and data sources.
Once you’ve chosen a data lake solution, the next step is to identify sources of data for your security data lake. SecOps teams will want to capture security logs from cloud-based applications and services (including IAM services and network security tools), along with web servers, endpoint devices, and any on-prem network infrastructure. Threat intelligence from public/private feeds or cooperating organizations may also be ingested into the security data lake.
Next, you’ll need to set up and configure a process for ingesting data from the various data sources into your data lake. Cloud-based data ingestion tools include open-source options like Fluentd, Logstash and Apache Kafka, 3rd-party SaaS solutions like Wavefront, and public cloud services like Amazon Kinesis.
It’s important to choose a data lake solution that allows you to ingest data in its raw unstructured format and apply schema at query time - otherwise, you’ll have to deal with the up-front cost and complexity of transforming your data before it enters your data lake.
When you ingest large volumes of data into your data lake, you run the risk of creating a data swamp: a disorganized, poorly-maintained data lake that’s difficult to navigate and analyze. Cataloging or indexing security data as it enters your data lake helps you stay organized and keep track of the valuable data you’re storing.
AWS Glue and Google Cloud Data Catalog are public cloud services that deliver data cataloging capabilities on their respective platforms. SecOps teams can also implement an open-source tool like Apache Atlas, or take advantage of proprietary data indexing technology offered by 3rd-party vendors like ChaosSearch.
At this point, you should have security logs and other data streaming from your chosen data sources into your data lake platform. The next step is to connect your data lake to analytics, BI, and data visualization tools that allow SecOps teams to explore, transform, filter, and analyze the data to gain insights into your organization’s security posture.
The most sophisticated data lake solutions offer multi-model analytics capabilities, allowing SecOps teams to run full-text search, SQL queries, or ML workloads on their data.
Ready to build your security data lake?
With ChaosSearch, it takes just minutes to stand up a cost-effective data lake that reduces your SIEM costs and enhances visibility of your enterprise security environment with unlimited data retention to support long-term security analytics use cases.
ChaosSearch Security Data Lake Reference Architecture
ChaosSearch attaches directly to AWS or GCP, transforming your Amazon S3 or GCS storage backing into a hot security data lake. Once your security logs are ingested into cloud object storage, our proprietary Chaos Index® technology indexes the data 60X faster than Elasticsearch and with up to 20X data compression.
From there, you can use our Chaos Refinery® tool to virtually filter, transform, and query security logs to hunt for APTs or investigate the root cause of a security incident. Building your security data lake with ChaosSearch can help you reduce SIEM costs and increase visibility of your enterprise security posture with low management overhead and TCO.
Want to see just how easy it is to stand up a security data lake using ChaosSearch? Read our ChaosSearch for SecOps Solution Brief to learn more about how ChaosSearch enables scalable log analytics for security operations and threat hunting.