Observability is a key pillar for cloud-native companies. Cloud elasticity and the emergence of microservices help cloud native companies build massively scalable architectures, yet exponentially increase the complexity of IT systems.
The volumes machine-generated data – including logs, metrics and traces – created by these systems are crucial for stakeholders ranging from:
Datadog has emerged as an application performance monitoring tool to measure a system’s health and status based on the telemetry data it generates. While centralizing all telemetry in a single platform works well in the beginning, significant Datadog log management challenges arise at scale.
That’s because the underlying data technologies for monitoring, trend analysis/reporting and troubleshooting in Datadog are fundamentally different. That leads to unexpected complexity when it comes to ingested logs, retention and log rehydration. Fortunately, there are alternatives that complement and optimize this overly complex Datadog log management process.
While centralization of telemetry and an intuitive user interface (UI) make Datadog a popular application for fast-growing startups, its scaling challenges can quickly become costly. This problem is most prominent with log retention schedules. While the metrics and traces Datadog monitors are priced by host, and scale only with the number of new services, logs are priced by volume. That means costs scale more directly with usage … and they add up – especially in microservices architectures.
Datadog offers two different prices for logs: ingestion and retention. Ingestion allows customers to ship logs from their source and store them in Datadog. Retention allows customers to analyze their logs performantly. Logs that are ingested but not retained require extensive data transformation, via a process called rehydration. Rehydration takes hours and requires a dedicated resource, and teams typically pay the retention price for the volume of logs rehydrated.
Ingestion is priced at $0.10/GB while retention is priced at $1.06/GB to $2.50/GB for 3-day to 30-day retention, respectively.
To optimize their costs, organizations typically try to adjust their retention best practices. They start by reducing Datadog retention periods, or just ingesting a subset of the logs and dealing with the rehydration process when needed. Some may reduce the amount of logs ingested all together. Limiting requirements for data retention can become a problem for regulatory compliance and legal, as well as troubleshooting issues including data breaches.
Beyond costs alone, Datadog’s log management challenges can lead to suboptimal observability. Over the long term, reduced retention can surface issues with troubleshooting cloud services, enterprise cybersecurity threats and more. For example:
Not to mention, SRE teams spend unnecessary time on data movement through rehydration – time that could be used to improve system reliability. That leads some teams to seek out Datadog alternatives for use cases that extend beyond monitoring to troubleshooting and root cause analysis.
For teams that already use cloud object storage like Amazon S3 or Google Cloud Storage (GCS), moving Datadog workloads to a dedicated log analytics solution like ChaosSearch might be easier than you think. The ChaosSearch cloud data platform seamlessly complements Datadog.
ChaosSearch is purpose-built for cost-performant log management and analytics at scale. The solution allows teams to centralize large volumes of logs with unlimited data retention, and analyze them via an Elastic or SQL API at a fraction of the cost.
Getting started is easy.
ChaosSearch is a fully managed service that provides unlimited data retention with a starting price of $0.30/GB and significant discounts at scale. Avoid rehydration processes or discussions about retention. With ChaosSearch, teams can index it all and let users analyze data in their tool of choice.
By combining Datadog as a monitoring service with ChaosSearch as a forensics tool, teams can achieve true observability at scale. Meet service level objectives and key performance metrics – while freeing up teams to increase efficiency and fuel growth.