3 Straightforward Pros and Cons of Datadog for Log Analytics
Observability is a key pillar for today’s cloud-native companies. Cloud elasticity and the emergence of microservices architectures allow cloud native companies to build massively scalable architectures but also exponentially increase the complexity of IT systems.
Concurrently, the vast reams of machine-generated data created by these systems are crucial for an ever greater set of stakeholders - from SRE/DevOps and developers for monitoring and troubleshooting to SecOps for threat hunting to product & data science for A/B testing and growth.
This has led to the emergence of observability - the ability to measure a system’s internal state and health based on the telemetry data it generates (i.e., logs, metrics and traces), with Datadog being a tool of choice.
However, while centralizing all telemetry in a single platform works well initially, it creates significant challenges at scale. This is because the underlying data technologies for monitoring, troubleshooting and trend analysis/reporting are fundamentally different, often leading to ballooning costs, reduced data retention, increased operational burden and limited ability to answer relevant analytics questions.
ChaosSearch complements Datadog’s best-in-class infrastructure monitoring and APM application with a data platform built for cost-performant at scale for centralized log analytics without data retention limits via Elastic & SQL APIs at a fraction of the cost. The best smoke alarm (Datadog) and forensics tool (ChaosSearch) for all your internal users.
This blog post explores the most important pros and cons of leveraging Datadog for log analytics. We’ll highlight the key features and benefits that have driven Datadog adoption, along with the critical drawbacks that lead organizations to choose additional log management and log analytics solutions.
What is Datadog
Datadog is an infrastructure monitoring and observability platform primarily used by cloud-native companies.
The company started with infrastructure monitoring but now offers a variety of features including real user monitoring, application performance monitoring (APM), security monitoring and log management.
Datadog provides an easy-to-start and fully-featured application that allows customers to ingest all metrics, traces and logs across applications, infrastructure and third-party services. The company also monitors systems in a single platform, which is why it is popular among fast-growing companies.
Common use cases for Datadog include:
- Monitoring automation for DevOps
- Shift-left testing
- Real-time business intelligence
- Security analytics
- Digital experience monitoring
- And more.
Now that we’ve reviewed the primary ways to use Datadog, let’s focus on the pros and cons of fully depending on Datadog for log analysis in cloud-based environments.
Datadog Pros and Cons
1. Cloud-native startups love it
Datadog made a name for itself as an infrastructure application monitoring platform for cloud-native startups. As these startups grew, many stuck with the service yet faced data ingestion and retention limitations. As mentioned above, Datadog is excellent at detecting issues, yet finding the root cause is far more complex.
2. Powerful and configurable UI
Many Datadog users love the clean user interface and the out-of-the-box dashboards within the platform. This single pane of glass is useful for visualizing the entire system. Drag-and-drop widgets let you create custom views without having to code. An array of visualization tools allow you to see data in a variety of formats and easily generate reports.
3. Easy to get up and running
Datadog is simple to get up and running. You can install and configure the Datadog agent quickly and connect external services via API integrations. However, once you dive deeper into the log analytics use case, you may find that the ingestion and retention “rehydration” process becomes far more costly and complex than you want to manage.
While the centralization of telemetry & intuitive UI make Datadog a very popular application for fast-growing startups, its cost as they scale become a key challenge, especially in the current market environment. There is no place where this is more prominent than logs. While metrics and traces are priced by host and hence scale only with the number of new services, logs are priced by volume, so they scale more directly with usage, especially in microservices architectures.
1. Complex Log Ingestion, Indexing and Retention Process
The log analytics process within Datadog is far more complex than it needs to be. You can send logs to Datadog, but you can’t analyze them. If you want to analyze them, you need to index and retain them. There’s a separate pricing structure around ingestion and retention (we’ll cover more about that next). Because of the complexity and cost structure, some organizations choose not to retain as many logs as they might need or want to. That leads to issues when troubleshooting and root cause analysis, especially for persistent issues that last beyond the retention period for your logs. To index and analyze your logs, you need to get the logs out of your cloud object storage (e.g. Amazon S3) and rehydrate them. This process can take hours and requires someone to manage it. With persistent talent shortages and an overabundance of work for DevOps and site reliability teams, many organizations can’t afford to manage this level of complexity.
2. Costly Log Analytics Workflow
When it comes to logs, Datadog charges $.10 to ingest data but $1.06 (3 days) to $2.50 (30 days) for retention. To retain logs for longer, you need to call Datadog to arrange custom pricing, which can quickly add up as companies scale. While Datadog is helpful for monitoring and detection, once it comes to root cause analysis and troubleshooting, these costs can quickly balloon out of control.
3. Scaling Challenges
Shortening log retention windows can become a significant tradeoff and result in a loss of visibility into more complex issues – ranging from lingering application and infrastructure performance problems to advanced persistent security threats. Many startups that begin with Datadog find that as they scale, they end up spending absurd amounts of money to retain their logs. With scale, Datadog becomes both more expensive and harder to use.
Datadog + ChaosSearch = Best-of-breed observability at scale
ChaosSearch can seamlessly complement Datadog. Our data platform, purpose-built for cost-performant analytics at scale, allows customers to centralize all their logs with unlimited data retention and analyze them performant via Elastic or SQL API at a fraction of the cost.
It’s simple to get started. All you have to do is:
- Send your logs to your Amazon S3/GCS: you can send them directly from the source or ingest them into Datadog and use Amazon S3/GCS as the destination.
- Connect to ChaosSearch: give ChaosSearch read-only access to the raw log buckets, create a new bucket for Chaos Index and create a few object groups and views.
- Analyze your logs via Elastic or SQL APIs: Analyze all your logs in our console via Kibana (for troubleshooting), Superset (for relational analytics) or via Elastic or SQL API.
Furthermore, all of this is done in a fully managed service at a fraction of the cost. ChaosSearch provides unlimited data retention with a starting price of $0.80/GB and significant discounts at scale. No more rehydration processes or discussions about retention. You can index it all and let your users analyze data in their tool of choice.
Set yourself up with the best smoke alarm (Datadog) and forensics tool (ChaosSearch) for all of your internal users at a fraction of the cost. You can have true observability at scale across your systems, free your teams from toil and improve your efficiency to further fuel your growth.
Listen to the Podcast: Employing Foundational Tools to Implement Unified Solutions in IT
Check out the Whitepaper: DevOps Forensic Files: Using Log Analytics to Increase Efficiency