The Problem With Logging

The problem with logging isn’t that there’s too much of it, but that our tools for searching logs are so primitive.

I’ve taking to using retroactive logging, whereby I keep small, debug-focused logs in memory, and only spit one out if it’s triggered by an error condition.

I’ve also taken to storing structured data in these retroactive log entries, in particular snapshots of variables that are likely to cause trouble.

The goal is to provide much of the functionality of a retroactive debugger. Those things, which never escaped Computer Science research labs, let you step both forward and backward in your code.

I’m still rubbing my chin, digesting everything you’ve said.

You raise excellent points. Even if we change nothing at my site, it will cause discussion and review. Good post.

Logs are there for two reasons they are separate and distinct

Debugging/Trace - log everything and use instead of the debugger because the program runs differently in the debugger …

Normal Logging : Record what happened and what was done so you have a record when either a) someone asks (sometimes for legal reasons) b) you need to know what made something brake so you can duplicate it

This last one happens more often then pure software developers think, when you interact with hardware it sometimes does not respond normally and you have to cope… you can test 'til the cows come home but never spot some of these, and the log file (if you use it correctly) will tell you what happened …

If your logging can cause deadlocks, your design/use of logging facilities is seriously flawed. Logging must not have any side-effects other than the time penalty, this even includes parameter evaluation for logging values of certain variables. As an example, suppose you want to log the current speed of something, you should absolutely make sure that your distance/time calculation doesn’t involve any time near zero.

All other points you made are surely valid in some ways, but apart from that logging can be a great tool, if used correctly. Problem is: It often turns out, that the developer usually logs the wrong things, simply because he logs things he thinks are important, automatically taking more care of the code which is about to be logged. Usually the flaws are then in the parts that don’t get logged, because nobody thought of them possibly going wrong and needing to be logged. BTDT.

I did work on a data processing system, with a fairly high volume of records going through on a nightly basis. Efficiency and accuracy were both extremely important. Naturally, (?) I logged everything that happened, and then some. It was a good lesson to learn, as I went back through the code and wheedled out log code that.

I also found great jumps in efficiency for the overall system… and guess what? Not once in six or so months have I said to myself Man… I wish I still had been running those logs…

Logging is essential in server software. Doubly so in software that runs on someone else’s servers so you can’t drop all the source code on it to run a proper debug session. Post-mortem debugging with core dumps is useful, but often you need to know what happened before the program got into the state where it crashed.

Logging should never be done to a database – WTF were they thinking?? – if only because one of the things you want to be able to log is database failures.

Logging should be done either straight to a file, or through a simple IPC call to a logging daemon, either syslogd or something like that. One IPC call, one disk write call, done. No locks, no DB server overhead.

As for the question of excessive logging, any decent multi-level logging system quietly eats messages below the current logging threshold. You normally run in the middle of the logging range, turning up the level only briefly, to debug a severe problem. 99% of the time, the default log level should be sufficient. Also, that log level should be of a low enough volume that a person can reasonably read it in real time. If it’s going by so fast that you can’t read it, it won’t be read. The exception is higher debugging levels, where you can study a snapshot of the logs off-line, but again, this should be a 1% or less case.

As for the contention that messages should go to the user interface, what if the program doesn’t have a user interface? (Server software again.) I’m not talking about a user interface that may be bolted on top of the server software. That has its own concerns, and so can have its own messaging architecture. Anything the server wants surfaced to the user level, it can send up to the UI, sure, but that’s no reason to not do logging. There’s plenty that a program wants to say in a log that users shouldn’t see.

As for logging being like excessive comments, I find that log lines are effective substitutes for comments. Anytime the programmer wants to explain what’s happening to both future programmers and people running the software, it goes in a log message, not a comment.


What is the fastest way of (not) logging?

For some logger log, writing,

log.Debug(Entry number: + i + is + entry[i]);

incurs the cost of constructing the message parameter, […]

If you are worried about speed, then write

if(log.IsDebugEnabled)
{
log.Debug(Entry number: + i + is + entry[i]);
}


Speed vs. safety:

Even this supposedly simple example may trigger something like an array out of bounds exception, making the application crash, once you enable debug logging. :smiley:

As an administrator of a small LAN (my family network), I like network daemons when they’ve high levels of logging.
My favorite daemon: The exim MTA.
Kilobytes of logging are output but I quickly find WHY my e-mail was rejected or wasn’t properly transmitted.
As I’m not necessarily familiar with the software, logging errors is not enough, I need to have some level of tracing to see what’s going on.

I agree that very high levels of logging aren’t necessary… When I want to get every single system call, I use strace (acutally, this happens quite often).
However, if logging is properly hierarchized so that DEBUG info that is only used to diagnose program’s internal errors aren’t shown at normal log levels.
Logging exceptions only isn’t enough.

Another thing: Good logging isn’t so much defined by the quantity than by the quality!
That’s obvious but.
foo-bar-app: Main configuration file /etc/foobar.conf read access forbidden for daemon user foor-bar-user.
is much better than:
Error: EACCESS
As the former contains info about:

  1. The exact application or daemon which met the error. Depending on the logging facility, this may be automatic.
  2. The file name.
  3. The type of access that was attempted (read access).
  4. The effective user id of the daemon.
    With all that info, fixing the issue should be TRIVIAL!

When strace is better than your log files, then, there’s something wrong with your logging policy.

This story would be a lot easier to understand if you mentioned the log messages were (apparently) going to the database ( http://logging.apache.org/log4net/release/config-examples.html has examples of how to log to the database). The locking in question was apparently a combination of internal database logic and internal log4net locking ( http://ddj.com/architect/202802983 ).

You may also want to consider logging to a file or the Windows Event Log instead of to the database. It’s relatively easy to use the .Net API to copy your messages out of the Event Log and into a Linq-compatible container, so your queries become very similar to dealing with log messages in a database.

If it’s worth saving to a logfile, it’s worth showing in the user interface.

But the logfile is part of the user interface: it is the user interface for system administrators and/or developers rather than end users. Each user category needs to see different information.

A common generic use case in most web applications I’ve worked on is:

  1. The application bombs with a simple error message for end users (retry in a few minutes, don’t bother retrying, contact support, etc.); hiding exception stack traces etc. is a priority.
  2. Someone is alerted; the log file is downloaded from production servers, compressed, and emailed it to the relevant product or account manager.
  3. The relevant developer reads the log and figures out what happened. More often than not the problems are corrupted and invalid data, server overload, network failures etc. rather than application bugs.

In this kind of scenario excessively verbose logs are a minor problem; wasting days of testing to reproduce a rare condition because the log is unclear is much worse than wasting a bit of performance or filling disk quotas.

shakes his head

You know what? Futurama irony (rhetorical irony) isn’t the only kind of irony. And being a coincidence doesn’t disqualify a thing from being irony.

In this case, logging is expected to aid in solving a problem. But it in fact exacerbates the problem. This is, legitimately, irony. Specifically, it’s situational irony.

http://dictionary.reference.com/browse/situational+irony

an outcome that turns out to be very different from what was expected…

You could even argue that all uses of rhetorical irony are also uses of situational irony, because the expectation is normally a fairly literal use of words.

Logging tool Chainsaw
Chainsaw is a GUI log viewer and filter for the log4j package.

I unintentionally portmanteaued Stack Overflow post as Stack Overfloast. Thankfully I didn’t portmanteau Stack Overflow logging!

i feel that this is worthwhile information

Yes, I agree that logging is a good point to start a discussion. Personally I think that there are no fixed rules when writing logging for an application as there are almost no fixed rules in general when writing code because everything depends on the context of your application. For example, you can’t simple follow the rules of the Effective Java but you must think about what are you doing, you must think about the meaning of the objects, services ecc that you are writing. If there is a rule i think it is: Think about it before writing down (logging, exception ecc)!. I agree, that this don’t prevent to have errors because I think writing absolute bug-less code is almost impossible; but if the reasons for writing that code (if there was a correct logic behind it)were correct your life will be much easier.
I think that AOP is a good instrument to get a cleaner code (I hate mixing logging with business code) but I don’t know if it is enough.
A wrong logging can produce errors that seems strange, for example when I changed the logging level from info (stage) to warn(prod) my application started to behave differently because log.info incorrectly helped to initialize some hibernate cglib proxies and because in stage everything worked well nobody was aware about it until we went in production!

If it’s worth saving to a logfile, it’s worth showing in the user interface.

I have to disagree with this one. Hallo? How I are translate string? I not find nowhere after look long time. This look like error message but what is errors being about? What is ‘pointer’?

There’s a lot of stuff in our application that falls into this category. I’m the guy who has to deal with our translators, and this kind of thing drives me up the wall. I canned all the visible dialogs and pulled it all out of the GUI. The rare useful error reports about such and such dialog popping up are not worth the headache of dealing with explaining obscure development-related concepts to translators.

Another approach is to just pop up dialogs in English, but every time I do that, some helpful person bitches and moans that I forgot to include such and such string in the translation, and fixes it for me.

I can’t win.

The best solution in this case is to say the hell with it, and hide everything from the surface interface.

You are absolutely right. We’ve an app where the log file is an absolute mess. Nothing can be found in there (at least not in the first couple of minutes) and it’s nothing more than a waste of time.

I think very detailed logging like that has its place in traditional software systems which are installed on hundreds of different servers at customers’ sites. If you have a detailed log file at the customer’s server then you can many times look at the log file and look at your code and figure out what’s happening. I’ve seen this with an application that we’re a reseller for many times.

I think that’s what Brian W. Kernigan and Rob Pike are talking about in their book The Practice of Programming:
As personal choice, we tend not to use debuggers beyond getting a stack trace or the value of a variable or two. One reason is that it is easy to get lost in details of complicated data structures and control flow; we find stepping through a program less productive than thinking harder and adding output statements and self-checking code at critical places. Clicking over statements takes longer than scanning the output of judiciously-placed displays. It takes less time to decide where to put print statements than to single-step to the critical section of code, even assuming we know where that is. More important, debugging statements stay with the program; debugging sessions are transient.

Getting all the necessary data from the customer onto a test system so you can do your debugging can be very time-consuming.

In today’s environment with hosted systems such as StackOverflow the data is much more accessible so you can quickly get it to a test environment, then debug the problem and get an answer, so having detailed logging is not as necessary as with an On-Premise system.

Sounds to me like that logging framework is not totally thread-safe, which would be a bug in that code. I wouldn’t call all detailed logging evil because of it.

Sarel

I don’t mean to single out the author here

Who do you think you’re kidding?

I agree with lots of the folks who have disagreed with Jeff. Logging can be essential in apps that sold to run on many computers, and it’s trivial to cope with multithreading and alter the level of logging through a config file. Make a team decision about how much logging is applicable and stick to it. If you only deploy on one server next to your desk, none might be fine for you.

Is it just me, or do a number of Jeff’s recent posts over-generalise his experience one particular day with one project into a general rule for all development? And in this particular case, why isn’t the deduction understand deadlock potential throughout your code or don’t use multi-threading rather than don’t log!