The Business Case for Switching from the ELK Stack
Last year we published a popular paper on how to calculate the true cost of an Elasticsearch, or ELK (for Elasticsearch, Logstash, Kibana) stack environment. The paper helps readers calculate their overall annual cost of ownership for their ELK environment, and reveals how the cost burden of ELK is much higher than anticipated for most customers.
That paper clearly hit a nerve — it’s been, by far, our most downloaded piece of content. And while the paper has been very valuable in helping readers understand the actual cost of their ELK environment, it stopped short of providing guidance on switching to an alternative platform when reducing Elasticsearch costs.
That’s why we’ve just released a new paper that complements the original one, The Business Case for Switching from ELK to ChaosSearch. This new paper summarizes how to calculate the overall TCO of ELK that our original paper explained, and then provides a detailed comparison that shows how ChaosSearch can provide massive annual cost savings for most customers.
The paper includes detailed apples-to-apples comparisons between ChaosSearch and ELK across three typical customer scenarios. Finally, it addresses the process (and costs) of switching from ELK to ChaosSearch to help readers build a comprehensive business case for making the switch.
The Importance of Log Analysis and Management at Scale
If you’re reading this post, chances are you already understand the importance of log management systems. Log data contains the insights an organization needs to run more effectively, and more securely. The sum of an organization’s log data provides the details of the entire IT environment in real time, or at any point in time in history. Server logs often contain details on machine and network traffic, user access, changes to applications and services, and countless other pointers used to monitor the health and security status of the IT landscape.
Log analytics systems allow users to extract the intelligence from the data by running simple searches, complex queries, conducting trend analyses, and building data visualization.
Key functions rely on log data analytics in their day-to-day operations including cybersecurity, infrastructure monitoring, customer support, and business intelligence.
Given how critical many of the use cases above are in the daily operations of the business, organizations need to ensure that their log analytics platform provides access to all relevant log data, including historical data… and this requirement is where many organizations hit problems today.
The underlying architectural limits of the ELK stack makes scaling cumbersome and costly, and constrains the IT team’s ability to provide access to all log data that the various groups within their organization require.
Calculating the Total Cost of Ownership of an ELK Stack Environment
In this paper, we discuss what is seen as an unanticipated “trap” by many. Because ELK is open source, getting started is simple and relatively inexpensive. However, as ELK stack environments scale up to incorporate more data sources, and customers look to retain data beyond a few days, the overall deployment quickly becomes complex, causing costs to rise rapidly.
At the root of the problem is the distributed architecture, which requires data to be partitioned and stored across numerous shards. And separate servers must be deployed, with each one responsible for its portion of the data. Thus, whereas it is easy to deploy and begin using the ELK stack with a low initial investment, most organizations quickly face ELK stack cluster sprawl, in which they are managing, and paying for, significant compute and storage resources.
Our paper, and the accompanying TCO tool, help customers consider all of the costs and calculate the overall total cost of ownership of their ELK environment, including compute and storage, operations, and support. When the full costs are bundled and assessed, the results are often surprising. For example, the paper demonstrates how the TCO of a relatively small environment can easily exceed $2 million over a three-year time period.
Small ELK Stack Environment Overview:
- 500 GB of daily log data ingest in year one
- 75% annual data growth rate
- 60 day active data retention
3-Year ELK Stack TCO
From this, it’s easy to understand how customers with larger environments face exorbitant annual costs, with TCOs in the tens of millions of dollars. Indeed, the paper shows that a customer with 20 TB of log data ingested per day will face a 3-year TCO of over $65 million!
Large ELK Stack Environment Overview:
- 20 TB of daily log data ingest in year one
- 35% annual data growth rate
- 60 day active data retention
3-Year ELK Stack TCO
Once the cost of your ELK stack environment is well understood, the next question is “now what?”.
More than Cutting Elasticsearch Costs: Motivations to Switch from ELK
As ELK customers know, the TCO problem is more than simply a budgeting issue – it has a ripple effect. Whereas it might seem easiest to “throw money at the problem” by continually expanding the budget to cover the cost of the increasingly complex infrastructure environment, and the associated personnel to manage it all, inevitably customers will face painful tradeoffs. In a practical sense, this means the team responsible for the ELK environment will need to find ways to curtail the growth by limiting the amount of data ingested per day, and limiting the data retention rate. Keeping in mind that the centralized log management system provides the “single source of truth” for a given IT environment, decisions to limit the data captured and available for analysis create insight gaps.
Given how vital access to log data is for all of the use cases that depend on it, these gaps could have devastating effects. Whether investigating a data breach, or conducting trend analysis to determine the infrastructure requirements, gaps in the data can lead to faulty analyses. And the problem with log data is that it’s hard to know what you’ll need until you need it, so when making tradeoff decisions, the team managing the ELK environment is flying blind.
Moreover, as ELK environments grow in size and complexity, they become unstable. When pushed beyond the designed architectural scalability limits, ELK deployments frequently experience outages, which can have a severe negative impact on the operations that rely on them.
Finally, a Viable Alternative to the Elasticsearch Stack
Given the problems above, organizations are increasingly looking for alternatives. However, until recently, there were no great options. While there are many log management platforms in the market, all of the traditional alternatives suffer from limits to their scalability, and therefore hit the same budgetary and related problems that ELK suffers from.
The ChaosSearch Cloud Data Platform is the first viable ELK alternative that solves the underlying architectural problems inherent in traditional systems, thereby enabling both massive scalability and extraordinary TCO savings.
ChaosSearch takes a fundamentally different approach to search and analytics and therefore represents a new generation of log management platforms.
Whereas ELK and the other traditional log management platforms are “closed” systems, in which data is transformed during the ingest process, and stored within an internal database with its own data format, ChaosSearch simply connects to and indexes data that is already stored by the customer, in the customer’s existing cloud data storage. With read-only access to this customer data in the cloud, ChaosSearch builds a separate index without manipulating or taking custody of the underlying original data.
Although that difference might sound simple, it makes all the difference in the world. On ingest, ChaosSearch introduces no bottlenecks — the data can stream directly into a customer’s cloud storage in its native format. And because it avoids the burden of “data custody”, ChaosSearch has no internal database size constraint. ChaosSearch simply leverages the performance, scale, and economics of the public cloud. This is the key that allows ChaosSearch to deliver unlimited scalability, industry-leading resiliency, and massive time and cost savings.
Cost Savings Explained
ChaosSearch delivers massive TCO savings in two primary ways. First, the simplified architectural approach that ChaosSearch takes results in dramatically reduced infrastructure requirements. With ChaosSearch, customers need only pay for their cloud object storage environment, and can eliminate all spending on compute and block storage infrastructure associated with the ELK stack environment. Secondly, ChaosSearch is delivered to customers as a managed service, with a single monthly fee based on the daily ingest rate. This SaaS approach reduces the amount of customer operations personnel required to operate the environment down to a fraction of one full-time employee (FTE).
Cost Savings Quantified
The best way to understand the cost advantages of ChaosSearch is to conduct an apples-to-apples comparison, demonstrating the TCO of ChaosSearch vs. ELK for various customer scenarios. The paper analyzes three customer scenarios, demonstrating how ChaosSearch delivers significant savings compared to the ELK stack across all three.
The summary table below shows the 3-year savings that ChaosSearch delivers over the ELK stack for each scenario.
Assessing Switching Costs
No business case is complete if it ignores switching costs.
As every IT veteran knows, IT systems are often “sticky”. Once an IT solution is deployed and used in production for daily operations, it can be very difficult to switch over to a new solution, even if the team assesses a new alternative to be superior.
Switching to a new solution can mean system unavailability, and any new technology has the potential to introduce new risks. In today’s cloud-enabled environment, IT systems are expected to be available 24/7, and disruptive “rip and replace” projects are simply not viable.
Thus, any consideration to switch from one platform to another must have a strong value proposition in which the case to switch is overwhelming. And, importantly, the team must be confident that the migration path from the legacy system to the new one is smooth, and can be done with minimal disruption.
ChaosSearch fits the bill, as it not only delivers a massive TCO savings, but also makes the move from ELK to ChaosSearch seamless. Because ChaosSearch natively includes Kibana — the ELK stack visualization tool — and supports the Elasticsearch API, customers can move their operations over to ChaosSearch quickly and easily. And ChaosSearch’s new SQL API means customers can plug in other visualization tools (e.g. Looker, Tableau) as well. In doing so, customers can maintain the same dashboards, visualizations and pre-staged queries that they use in their ELK environment, simply by exporting them from ELK and importing them into the ChaosSearch deployment.
Customers typically deploy and run ChaosSearch in parallel to their existing ELK stack environment, allowing for an orderly process of testing and moving workloads to ChaosSearch over time. The duration of the transition is based on the number and variety of the workloads to be migrated. While some customers pursue rapid transitions, moving all workloads to ChaosSearch within a 2-week period, the average transition time we see is 30-60 days. The cost of this migration period can be quantified – it is simply the cost of maintaining the existing system as adoption ramps up with ChaosSearch. These costs should be added to the business case, as part of the Year One costs of the ChaosSearch TCO. This can be considered the initial investment required that allows you to realize the overall TCO savings of the three-year period.
Putting it all Together: The Business Case for Switching from ELK to ChaosSearch
Unlike many IT projects, in which the benefits of the new solution can be difficult to quantify, the business case for ChaosSearch is built on clear-cut, easy to quantify deltas between the cost of ChaosSearch and the equivalent ELK stack environment. Calculating the difference in TCOs shows the total cost savings over the 3-year period, and allows you to demonstrate the 3-year rate of return. The chart below shows both the ROI and rate of return for the Medium Size customer scenario:
The paper, and the accompanying TCO tool allow you to construct a similar business case for your specific scenario.
Search, Analyze, and Visualize Better
Customers managing medium-to-large sized ELK environments today face ongoing cost increases, while struggling to maintain uptime in an overly complex environment. Meanwhile, they must continually make tradeoffs that limit access to data for the groups that rely on log analytics in their daily operations. ChaosSearch provides the ideal replacement for the ELK stack, given that it delivers massive reductions in cost and complexity, solves the scalability problem, and enables a seamless transition. Quantifying the benefits of ChaosSearch allows you to build a rock-solid business case for making the switch.
If you’re managing an ELK environment today, or are involved in log analytics that rely on the underlying ELK stack, now is a good time to consider a change. The ChaosSearch team can assist you in using the TCO tool to develop a customized business case for your environment. We’d like to hear from you about your plans and any unique challenges you are facing.