Data Lake Opportunities: Rethinking Data Analytics Optimization [VIDEO]
Data lakes have challenges. And until you solve those problems, efficient, cost-effective data analytics will remain out of reach.
That’s why ChaosSearch is rethinking the way businesses manage and analyze their data.
As Mike Leone, Senior Analyst for Data Platforms, Analytics and AI at ESG Global, and Thomas Hazel, ChaosSearch’s founder and CTO, explained in a recent webinar, ChaosSearch offers a data analytics optimization solution that makes data faster and cheaper to store and analyze. At the same time, we’re broadening the use cases that data analytics can support.
Here’s a look at ChaosSearch’s approach to data management and data analytics, and how it solves the problems of conventional data management.
5 Keys to Data Analytics Optimization
We discussed the challenges posed by traditional data lakes and data management strategies in a previous blog post, so we won’t rehash them here. Instead, let’s talk about solutions that can optimize your business’s data analytics outcomes.
1. De-silo Your Data
Effective data analytics requires the ability to de-silo your data. That means the ability to ingest any data from any source, then analyze it centrally. Whether dealing with sales records, application logs or even support tickets, you should be able to move them to a data analytics platform in a consistent way.
Equally important is a simple data ingestion process. You shouldn’t need to move your data across multiple cloud services in order to get it into your analytics platform. Ideally, you should be able to ingest it directly, no matter what the data source is.
READ: Data Transformation & Log Analytics: How to Reduce Costs and Complexity
2. Optimize Analytics Costs
Relatedly, the more often you move and transform data in order to analyze it, the more you’ll pay for data analytics.
So strive to keep ingestion, storage and analytics architectures simple. Not only does simplicity translate to easier management, but it also reduces expenses.
3. Be Structure-agnostic
Data structures come in multiple forms. Some data, like that found in a relational database, is highly structured. Other data is semi-structured: It’s not rigidly organized, but it may at least be tagged or categorized. And still other data has no structure whatsoever.
Your data analytics stack should be able to accommodate data in all three of these forms -- structured, semi-structured and unstructured -- equally well. And it should not require you to transform or restructure data in order to analyze it.
4. Be Use-case Agnostic
Data analytics can support a wide variety of use cases, from business intelligence, to software development, to software deployment and reliability management and beyond.
In order to deliver the greatest value, a data analytics solution should be able to operationalize every potential use case. Even if you don’t need data analytics for a particular workflow today, who knows what the future will bring. Don’t get tied down by a data analytics stack that can only work with some kinds of data, or that can only deliver insights for certain types of use cases.
5. Optimize Data Performance
Last but not least, be smart about the way you move, transform, and analyze data. Conventional approaches to data analytics essentially adopt a 1970s-era mindset. They rely excessively on indexing, they don’t take full advantage of the cloud, they require inflexible data structures and so on.
A modern data analytics approach works differently. It makes full use of compression, avoids unnecessary sharding and leverages high-performance cloud storage to ensure rapid data movement, access and analytics.
All of the above are at the core of ChaosSearch’s approach to data analytics. They’re what make ChaosSearch different from the outmoded analytics solutions that require you to build a data lake, move data around a bunch of times, force it into rigid structures, shard it and only then analyze it.
READ: 9 Essential DevOps Tools
Optimizing Data Analytics in Real Life: HubSpot’s Experience
To illustrate what data analytics optimization looks like in practice, consider the experience of HubSpot, which Mike and Thomas detailed in the webinar.
Initially, HubSpot relied on a self-hosted ELK environment to analyze the 20 terabytes of CDN log data that it produced each day. The solution required three full-time employees to manage and cost $300,000 per month in AWS hosting costs alone. HubSpot was also able to retain data under this approach for only ten days.
Unhappy with this state of affairs, HubSpot moved to ChoasSearch, a fully managed SaaS solution that HubSpot can manage with the part-time effort of a single employee. The analytics stack now costs HubSpot about $70,000 per month, delivers 99.999 percent uptime and supports 90-day data retention.
Conclusion
There are lots of possible approaches to data management and analytics.
Some deliver much better flexibility and performance than others. To make the most of data analytics, you should drive for a solution that is data-agnostic, cost-effective and optimized for speed and fast results.
Read the Series
Part 1A: Data Lake Challenges: Or, Why Your Data Lake Isn’t Working Out
Part 1B: Data Lake Opportunities: Rethinking Data Analytics Optimization
Part 2: AWS Monitoring Challenges: How to Approach AWS Management