ChaosSearch Blog - Tips for Wrestling Your Data Chaos

Integrating Observability into Your Security Data Lake Workflows

Written by Dave Armlin | Oct 20, 2022

Today’s enterprise networks are complex. Potential attackers have a wide variety of access points, particularly in cloud-based or multi-cloud environments. Modern threat hunters have the challenge of wading through vast amounts of data in an effort to separate the signal from the noise. That’s where a security data lake can come into play. Having the right data observability tools and observability integrations could mean the difference between identifying the root cause of a threat and potentially missing an attacker that’s already infiltrated your infrastructure or applications.

Some dedicated security observability solutions like Security Information and Event Management (SIEM) platforms can be effective “smoke alarms,” alerting SecOps teams to potential threats as they emerge in real-time. However, given the volume of logs and system event data, and the prevalence of advanced persistent threats, it may be more effective to complement a SIEM with an additional observability platform targeted at identifying the root cause (vs. alerting alone).

 

 

As threats and attack vectors multiply and increase in complexity, it is essential to store data longer and to bring in more data sources. A security data lake can help teams sift through the noise, investigate, respond, and mitigate real threats as they emerge, as well as look at the entire lifecycle of an incident comprehensively. On top of that, a flexible layer of automation can drive analysis of the many data sources in a security data lake, assess risk, and engage security teams when necessary to provide human review of conditions.

Here’s how to build data observability using many tools you may already have.

 

Building a Security Data Lake Workflow in S3

If you use Amazon S3, it’s easier than you think to create an effective security data lake workflow. By leveraging S3 and integrating observability capabilities through ChaosSearch, you can build a security data lake with the ability to index data directly in cloud object storage. This can be done at petabyte scale with disruptive economics and performance.

In case you’re unfamiliar, ChaosSearch indexes data directly in Amazon S3 and writes a highly optimized index back into your S3. The platform indexes and provides the ability to query your data from a stateless compute fabric in proximity (the same AWS region/availability zone) of the centralized source bucket. This produces two key benefits: reduced latency for reads/writes of data to S3 and no egress/networking costs. The ChaosSearch architecture is enabled via an IAM Policy allowing read-only to the source data bucket(s) and writing access to a bucket where it can write the ChaosSearch index data. ChaosSearch does not store any of your data and employs a highly scalable stateless compute fabric to meet indexing and query demand at any scale.

The ChaosSearch index is a complete representation of the source data. Once data has been indexed, the source data can be sent to Glacier or other lower-cost storage tiers or even deleted. The ChaosSearch index can be up to 90% smaller than the source data. This index is accessible from the OpenSearch Kibana in ChaosSearch, remotely via API (OpenSearch, SQL/JDBC) or Grafana via the OpenSearch plug-in.

Let’s take a look at automating your response workflow using AWS Step Functions and ChaosSearch.

 

 

DOWNLOAD: The Threat Hunter’s Handbook

 

What is the Best Cloud Monitoring Solution for Security?

Many organizations use a best-of-breed approach to cloud monitoring. For example, security teams may use a SIEM or security orchestration (SOAR) solution for monitoring. On top of that, they may layer in a security data lake setup like the one described above for deeper observability and investigation capabilities. This combination provides the best of both worlds and can provide true observability vs. monitoring.

AWS Step Functions, or Amazon’s serverless visual workflows for distributed applications, can provide a way to automate portions of your security team’s response workflow and leverage other best-of-breed tools to communicate, track and manage these investigations.

AWS Step Functions can be utilized to build on the value of the data streaming into the ChaosSearch data platform. The goal is to automate and integrate with other complementary systems, which is what makes the data lake philosophy so powerful, to begin with. An incident management workflow should be simple for busy SecOps teams and leverage the tools they already use whenever possible.

READ: 10 DevOps Tools for Continuous Monitoring

ChaosSearch has embedded alerting through an OpenSearch Kibana integration, which has the ability to query and alert when certain conditions are met. In addition, ChaosSearch can integrate via webhook with any system that has an accessible RESTful API (Slack, Microsoft Teams, Amazon Chime, AWS SNS, PagerDuty, ServiceNow, Jira, Salesforce, Zendesk, etc.) to capture alerts where your team is already working.

READ: How to Create a Dashboard in Kibana

ChaosSearch can Monitor and Alert/Trigger via webhook with Monitor queries that you specify on thresholds or patterns in the data you define. Here are some example Monitors:

 

 

Consider the data lake architecture you might employ to automate and assess risk based on multiple streams of data and trigger an investigation based on a specific set of events. When the risk threshold has been exceeded, a ChaosSearch Alert will trigger Step Functions to kick off the incident response workflow.

WATCH: ChaosSearch Alerting Overview

A starting point for the AWS Step Functions workflow would include:

  1. Notifications to the SecOps team (Slack or Microsoft Teams, SNS, Amazon Chime Chat)
  2. Creation of an incident in your Incident Management Platform (Jira, OpsGenie, ServiceNow, etc.), which would also kick off any workflow that exists in these platforms

Additions to the security workflow could include:

  1. Automated remediation steps on anything that is actionable from the data and updates to systems and entities involved
  2. Scheduled automated checks on incident and systems status

 

 

Maximize Your Log Data with Integrated Observability

ChaosSearch provides the ability to centralize all security log and event data in low-cost cloud object storage. The platform allows powerful yet flexible automation and integration capabilities, enabling the use of AWS Step Functions and other systems. In the end, ChaosSearch enables SecOps teams to use the tools they already know and love while deepening their investigation capabilities on more expansive volumes of data. This style of proactive threat hunting can ultimately save teams from the cost and reputational damage resulting from a data breach or advanced persistent threat.

 

Additional Resources

Read the Blog: Tutorial: How to Use ChaosSearch with Grafana for Observability

Learn More About: ChaosSearch’s Integrations

Check out the eBook: Beyond Observability - The Hidden Value of Log Analytics