How to Overcome Datadog Log Management Challenges
Datadog has made a name for itself as a popular cloud-native application performance monitoring tool, measuring a system’s health and status based on the telemetry data it generates. This telemetry includes machine-generated data, such as logs, metrics and traces. Cloud based applications and infrastructure generate millions (even billions) of logs – and analyzing them can generate a wealth of insights for DevOps, security, product teams and more.
While Datadog is a great smoke alarm to detect issues, deeper troubleshooting and root cause analysis becomes a massive challenge as organizations scale. The most common challenges include log retention schedules, data transformation and rehydration processes, and cost. Let’s dive into why log analytics are important, top log management challenges with Datadog, and how to navigate them.
The importance of log analytics for cloud native companies
According to Eckerson Group, there are four primary use cases for analyzing log data: ITOps, DevOps, security and customer analytics. Vice President of Research Kevin Petrie elaborates on each use case:
- ITOps: Platform and site reliability engineers analyze IT logs from applications, devices, systems and network infrastructure. This helps them monitor and adjust IT service delivery, to improve performance and reliability.
- DevOps Analysis: DevOps engineers analyze IT logs to track and optimize the software development lifecycle. This helps them speed up releases, reduce bugs, and improve collaboration between development and operations teams.
- Security Analysis: Security administrators analyze logs of events such as user authentication attempts, firewall blocking actions, file integrity checks, and the detection of malware. They use their findings to predict, assess and respond to threats, and assist compliance efforts.
- Customer Analytics: Marketing managers and BI analysts study customers’ digital footprints, including website clicks, asset downloads, service requests and purchases. Analyzing these logs helps describe customer behavior, identify intent, predict behavior and prescribe action.
Petrie adds that these use cases have a common problem: processing data at scale.
This issue crops up across a wide variety of observability, monitoring and dashboarding tools such as ElasticSearch, CloudWatch and Datadog. Fortunately, organizations can supplement these tools with alternatives that allow for powerful log analytics at scale with unlimited data retention.
Datadog challenges: Costly log retention
To execute properly on these use cases, organizations must be able to access large volumes of log data for analysis. One of the biggest challenges with tools like Datadog is the high cost of log storage and retention. As a result, many teams choose not to retain data past a 30-day retention period – which can be problematic as they try to dive deeper into root cause analysis or advanced persistent security threats.
While centralization of telemetry and an intuitive user interface (UI) are certainly Datadog pros, its scaling challenges can quickly become costly. While the metrics and traces Datadog monitors are priced by host, and scale only with the number of new services, logs are priced by volume. Costs can add up for organizations that use microservices architectures, which generate high volumes of logs.
Datadog offers two different prices for logs: ingestion and retention. Ingestion means shipping logs from their source and storing them in Datadog. Retaining logs in Datadog allows customers to analyze them performantly. Logs that are ingested but not retained require extensive data transformation, via a process called log rehydration. This process can take hours and requires a dedicated resource, and teams typically pay the retention price for the volume of logs rehydrated.
Ingestion is priced at $0.10/GB while retention is priced at $1.06/GB to $2.50/GB for 3-day to 30-day retention, respectively. To optimize their costs, organizations try all sorts of workarounds. Whether that means reducing Datadog retention periods, ingesting a subset of the logs and dealing with the rehydration process when needed, or reducing the amount of logs ingested altogether – none of these workarounds are ideal. Besides hindering the team’s ability to generate insights, limiting requirements for data retention can become a problem for regulatory compliance and legal, as well.
How to improve Datadog log workflows with ChaosSearch
As Petrie points out in the article cited above, new cloud-based platforms can alleviate some of the bottlenecks with log data:
“They rapidly and dramatically compress indexed data, which is critical given the high number and small content of logs. Users can automatically discover, normalize, and catalog all those log files, and assemble metadata to improve query planning—all with a smaller footprint than predecessors such as Lucene. The log data remains in place, but presents a single logical view with familiar visualization tools the user already knows (such as Kibana via open APIs).”
Datadog users can easily move their log workloads to a solution like ChaosSearch – while keeping the other monitoring capabilities intact. Getting started is easy.
- Send logs directly to Amazon S3 or Google Cloud Storage (GCS): Send log data directly from the source, or ingest it into Datadog & use S3/GCS as destination.
- Connect to ChaosSearch: Grant ChaosSearch read-only access to the raw log buckets. From there, teams can create a new bucket for Chaos Index to make their data fully searchable, or create a few object groups and views.
- Analyze logs via Elastic or SQL APIs: Investigate issues with infrastructure and applications in the ChaosSearch console via Kibana (for troubleshooting), Superset (for relational analytics), Elastic or SQL API.
Using ChaosSearch, organizations can take advantage of unlimited data retention with a starting price of $0.80/GB and significant discounts at scale. In addition, analyzing data in place avoids processes like data pipelines and rehydration – helping teams achieve much faster time to insights. Combining Datadog with ChaosSearch can help organizations achieve best-in-class monitoring as well as a deeper understanding of the systems that drive business growth.
Read the Bog: Best Practices for Effective Log Management
Listen to the Podcast: Differentiate or Drown: Managing Modern-Day Data
Check out the eBook: Beyond Observability: The Hidden Value of Log Analytics