Logging Blindspots: Top 7 Mistakes that are Hindering Your Log Management Strategy
Today, virtually everyone who manages infrastructure or applications relies on logging to understand what is happening within their environments. But some teams do logging better than others. Although there is no one right – or wrong – approach to log management, there are a variety of logging mistakes that engineers commonly make when deciding what to log, how to log it, and how to work with their log data.
With that reality in mind, here’s a look at seven of the most common mistakes that may be undercutting the efficiency and effectiveness of your logging strategy.
Common Logging Mistakes
- Overlooking important logs
- Ignoring ephemeral logs
- Setting log levels
- Looking at logs via the CLI, only
- Retaining logs indeterminately
- Storing logs inefficiently
- Refusing to adapt your log management strategy
Logging mistake 1: Ignoring important logs
Modern IT environments contain more log sources – and, hence, more logs – than ever. Instead of just having server and application logs to contend with, you now typically have a log (or more) for each container, API server, cloud service, orchestrator, and so on that runs within your software stack.
Faced with so many logs, it’s easy to fall into the trap of ignoring some. You may, for example, choose to collect only application logs because you believe that they matter more than logs for the infrastructure that hosts your environment.
The fact is, however, that ignoring logs leads to observability blindspots. In general, you should collect and analyze any and all logs in your environment. Even if a given log doesn’t seem important now, it may turn out to be essential for providing the context you need for understanding a complex problem in the future.
Logging mistake 2: Ignoring ephemeral logs
Along similar lines, a common logging mistake teams make in modern environments is not collecting logs that are ephemeral in nature, meaning they are deleted permanently when the resource that hosts them shuts down – unless you aggregate the logs to an external location first.
The most common example of such logs is logs generated by containers, but serverless functions and some cloud services can pose the same challenge.
To ensure you don’t lose this log data, it’s critical to collect it in real-time from its source, then move it to an independent location where you can analyze and retain the log data for as long as you need. You never know when a container will shut down and take crucial log data with it.
Logging mistake 3: Log level mistakes
Although collecting and analyzing all of your logs is typically a good thing, you shouldn’t necessarily configure your logs to record every bit of data possible.
Instead, be strategic about setting log levels for your various resources. Log levels that result in too much data inside each log file can leave you drowning in information and make it more difficult to see through the noise. On the other hand, log levels that aren’t verbose enough undercut visibility.
Logging mistake 4: Only looking at logs via the CLI
Most command line environments provide tools that come in handy when you want to take a quick look at a log file. On Linux, for instance, you can pull out recent log data with a convenient “tail” command or search through a log file quickly with “less.” You can also, of course, use command line interface (CLI) tools to move, merge and rotate log files.
CLI tools like these are great when you need to glance at log data quickly. But they’re no replacement for a full-scale log management strategy. Efficient log management at scale requires tools that can automatically aggregate and analyze logs from multiple sources and allow you to identify the most important logging insights quickly – tasks for which generic CLI tools come up short.
Logging mistake 5: Log retention errors
Deciding how long to retain logs is a subjective topic. Every team’s needs in this regard will vary.
But whatever you do, don’t make the mistake of choosing not to retain logs at all just because it’s too complicated or expensive. Nor should you commit to log retention policies that are based on arbitrary time periods; don’t, in other words, choose to retain your logs for six months or one year just because those are round numbers that seem reasonable.
Instead, devise a log retention policy based on factors such as…
- How dated the log data that you work with on a regular basis is.
- Your log storage costs, which help determine how long you can feasibly retain logs.
- Compliance rules or security policies, which may mandate specific retention periods.
Logging mistake 6: Inefficient log storage
Speaking of log storage, the way you store logs can play a key role in the overall effectiveness of your log management strategy. If you keep logs in whichever location they live in by default, or you use a log management platform that requires you to store logs in just one place, there’s a good chance that you are paying more to store your log data than you need to.
A better approach is to prioritize flexibility and scalability regarding log storage. Aggregate and retain logs in whichever storage solutions make the most sense based on your budget, log access needs, and security requirements.
Logging mistake 7: Not adapting your log management strategy
Logging is not a set-it-and-forget-it affair. Most of the considerations we’ve discussed thus far are likely to require experimentation to get right. You will also have to update them as your environment – and, by extension, your logging needs – change.
That’s why you should continuously evaluate the effectiveness of your current logging strategy and update it as needed. Assess whether your current log retention policies or log levels, for instance, are working well, and adjust them if they’re not.
Conclusion: Toward a better log management strategy
Like most things in IT, the best log management strategy is one that is rooted in the concept of continuous improvement. You should always be on the lookout for ways to make your logging strategy better, and never settle for what you’ve achieved at present.
That being said, the continuous improvement journey starts with recognizing the weakest links in your existing approach to logging. That’s why you should be on the lookout for easy-to-make logging mistakes – such as incomplete log collection, poor log retention policies, inflexible log storage and beyond.
Read the Blog: Best Practices for Effective Log Management
Watch the Webinar: Why and How Log Analytics Makes Cloud Operations Smarter
Check out the Whitepaper: Log Analytics for CloudOps: Making Cloud Operations Stable and Agile