Managing the Mess of Modern IT: Log Analytics and Operations Engineering
IT is messy stuff.
Enterprise applications and devices rely on a web of interdependent clouds, networks, and containers. IT operations (ITOps), development operations (DevOps), and cloud operations (CloudOps) engineers work hard to manage this mess. If they succeed, they create a stable, agile IT environment that makes their enterprise more productive. If they fail, their enterprise becomes less productive.
These operations engineers can increase their odds of success by analyzing the millions of logs that describe the performance characteristics of various IT components. Log analytics—the ingestion, transformation, searching and querying of all those logs—helps them manage IT infrastructure, streamline application releases, and optimize virtual cloud resources.
But you need to select the right log analytics product to achieve these productivity goals. Eckerson Group’s new whitepaper, Ultimate Guide to Log Analytics for ITOps, DevOps, and CloudOps, recommends five criteria for ops engineers to evaluate log analytics products: ease of use, analytical flexibility, performance and scalability, support for an open architecture, and governance capabilities. In this excerpt, we’ll explore analytical flexibility.
Analytical Flexibility is Essential
Your product should enable you to explore, discover, transform, analyze, and monitor the torrents of logs that describe events in your IT environment—as well as the messages IT components send to one another. To gauge how well a log analytics product does this, ask the following questions.
Evaluating Log Analytics Solutions: Key Questions
Does this product organize logs in a granular way?
Your log analytics product needs to assemble high volumes of logs and present granular views of them to many users, while supporting growth on both sides. For example, ChaosSearch enables ops engineers to create and join object groups, perhaps defined by log source or category. It creates a compressed physical index of all object groups, capturing metadata such as log source, format, and attributes. Users can view simple visualizations—such as tables, charts, or text clouds—for each object group. Using this physical index, ops engineers can start to transform and analyze the data.
Does it search and query log data?
Most log analytics scenarios today center on search. The Elasticsearch engine, like Google, performs full-text searches on high volumes of indexed logs and returns prioritized results to the user. But your log analytics product also should support structured query language (SQL) commands that enrich the ways in which ops engineers interact with log data. They need to search all their logs by keyword, then query those results and correlate their findings.
For example, a CloudOps engineer might need to search through resource utilization logs for a slow SaaS application, then query and correlate the results to understand how compute, storage, and network resource utilization levels relate to one another. They identify the root cause—such as network congestion—and reroute network traffic to remediate the problem. They start monitoring network utilization levels and set threshold-based alerts to stay ahead of future issues.
Does this product support all your log analytics use cases?
Your product should support all of the ways in which ITOps, DevOps, and CloudOps engineers need to analyze logs. For stability, these ops engineers might need to manage performance by fixing a SaaS application issue; control costs by identifying the data scientist that consumes so much Amazon EC2 compute; or improve compliance by shutting down usage of unmasked PII. For agility, they might need to manage application releases by checking how much memory a software update consumes, or help scale resources by setting threshold alerts for server utilization.
Build an inventory of your specific use cases and assess whether your candidate product addresses them all. It will need to provide comprehensive support, because ops engineers don’t have time to learn multiple log analytics products.
Does it support both batch and streaming data workloads?
Many use cases depend on real-time analytics of logs, i.e., within seconds or minutes, depending on enterprise requirements. Your log analytics product should ingest logs from streaming platforms such as Amazon Simple Queue Service (SQS), Google Pub/Sub, and Apache Kafka, and make those logs accessible for alerting and analytics within those latency windows. This might entail live indexing and transformation of incoming log streams.
Your log analytics product should empower operations engineers to improve the stability and agility of their IT environment, and thereby help their enterprise become more productive.
To make this happen, they need to define their minimum requirements and deal-breakers, test a range of scenarios, and anticipate what they’ll need to support down the road.
Enterprise operations engineers that follow these guidelines and ask the hard questions can use log analytics to help make IT simpler. When this happens, IT is worth all the trouble.
To learn more, get your copy of the Eckerson Ultimate Guide to Log Analytics White Paper today.
Read the Blog: Log Analytics 2021 Guide
Watch the Webinar: Why and How Log Analytics Makes Cloud Operations Smarter
Check Out Another Eckerson White Paper: Eckerson Deep Dive on the Cloud Data Platform