The Problem With Logging

A recent Stack Overflow post described one programmer's logging style. Here's what he logs:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2008/12/the-problem-with-logging.html

Jeff,

In your example, were you logging to a database? If you were logging to a database, were you using the same connection for the logging and the transactional work?

Using the same connection for both could lead to potential problems - the least of which would be deadlocks. What happens to any log messages when the transactional work get rolled back?

I still haven’t seen a clear root cause of the deadlock error.

Please tell me how your log4net can deadlock if it is only acquiring one lock? There is no chance for deadlock in this case.

In the case where multiple locks are used, the simple solution might be just to disable the logger lock if you can. The logger lock should only be to serialize the output to a non-threadsafe resource.

The real problem here is likely poor documentation in log4net on how to setup logging to a device that implements its own thread-safe locking and/or a discussion on this exact problem which should have been realized before log4net was ever released.

I’m with you on logging exceptions. But I am definitely not a fan of logging everything. Of course, I do write smaller apps, but when it comes to debugging I rely on my debugger, not logging.

I disagree with everything that should be logged should be in the UI, but that might because I only think exceptions should be logged.

Anyway, good post. It is always interesting to see things that are VERY very different than what I normally do.

If you have to perform any post-mortem debugging on remote systems, you better be logging as much as you need to clearly recreate the sequence of events that led up to the crash.

Also, log4net is already thread safe due to its own internal locking. Even from your description, I can’t imagine a scenario where those internal locks could affect your app. Why on earth would you add your own nested log locking mechanisms on top of that? You’re asking for obscure deadlocks…

Why on earth would you add your own nested log locking mechanisms on top of that?

We didn’t. Referencing one object caused that object (hey, database object abstraction magic!) to reference another object which wrote to the logs. It’s extremely timing sensitive, would only happen about once a day on a fairly heavily loaded public website.

I agree on most things, especially that too much logging counters productivity.

In the example, I think a major flaw is that INFO seems to be in the wrong position, according to most ligging tools it should be between DEBUG and ERROR/WARNING. The things that are logged with the INFO level now should normally be logged at the TRACE level, which generally is at the position that INFO holds now.

Naturally no items that would only interest developers should be logged at the INFO, WARN or ERROR levels.

Unhanded exceptions are called bug reports in my book though, so I don’t log them but rather send them to the bug repository.

I have also my own rules:

FATAL Level
Unrecoverable Errors (the application will end after this log)

ERROR Level
Recoverable Errors (the application is not going to stop)

NORMAL Level
Here is the main trick. I like to show what is logged at this level to the end user in a way he can read what the application is doing. Is just like if the application could tell to the user what is doing.

Example:
Reading application file …
… done.
Parsing the application data …
… done.
Found 3 items to process.
Processing item 1.

You can not put here a lot of information, just the main things the application is doing, like if you write your own diary.

HIGHER LEVELS
Here I put as much info as i want. It can be excesive if you want. 99% of time the information will be filtered out because at run-time you have to work in NORMAL debug level.

This is how I see logging, completely agree to the point that you should show to the user what you’re logging.

If you’ve ever supported someone else’s code, you begin to appreciate logging. What’s probably more important is good exception handling, which I think you’re kinda saying.

I love the example you gave. Shows how bad we are at really understanding the complexity of the program.

Although I occasionally log, I try to be flexible about it. There are scenarios when I feel that debugging would be more effective than logging. There are opposite scenarios as well.

There is, I think, an even deeper issue here: being excessive about a practice may get you into trouble. It seems that it happens all over the place: static type safety, monkey patching, factories, etc. Here’s one example WRT compactness http://javadots.blogspot.com/2008/11/shorter-is-better.html.

Sometimes logging can be useful as a tool to prove to the customer that he’s doing something wrong.

Our programs interface to large PBX’s and a host of other telephone-related systems. Looking into the log, peering over the different binary exchanges between the systems, and being able to show to the customer that it is, in fact, his PBX that is doing something wrong, has saved us hours and days of unbillable support time.

I’d say that in any situation where you’re communication between different systems, log what’s happening. If not, you’ll have no clue which system that actually malfunctioned.

Yeah, I have to agree with you Jeff. That is just way too much information. Your system could crash under the weight of your logging infrastructure. I don’t see how this would scale.

Help me out here. I don’t understand how a logging system can cause a deadlock.
I assume any logging framework would perform the following steps:

  1. Check log level
  2. Format log message
  3. Acquire lock
  4. Output log message
  5. Release lock

I don’t know log4net, but from your description, I gather that it performs step 2 after step 3. Is that what happens? It sounds weird. Why would it do that?

Be sure to log the deadlock!

I agree with you Jeff.

I think it’s safe to add that logging can be viewed like a set of requirements for a software project: start simple and increase complexity if need be, rather then anticipating this info would be useful in a log! (similarly, the user would probably want these additional features!).

The problem I’ve seen is that people are off creating new logging frameworks. Things way more complex than they need to be, with lots of little moving parts, but not a lot of sense…and there’s so many of them, none of them get tested.

It’s yet another recreation of the wheel. On Unix-like systems, we’ve got syslog. Three calls: openlog, syslog, closelog. One call to open early on, one call to close nicely, and just syslog() whenever you want. The manpage, while pithy, is usually enough to figure out the levels.

LOG_EMERG: system is unusable
LOG_ALERT: action must be taken immediately.
LOG_CRIT: critical conditions.
LOG_ERROR: error conditions.
LOG_WARNING: warning conditions.
LOG_NOTICE: normal, but significant, conditions.
LOG_INFO: informational message.
LOG_DEBUG: debug-level message.

And you can suppress these messages system-wide by configuring syslogd, or if you want to do it yourself, you can do it with setlogmask().

So you say But that isn’t portable!, to which I respond, Neither is ‘log4net’

syslog is specified in POSIX.1-2001. log4net, log4java, log4cpp, .NET, and much of this week’s Windows API isn’t really specified anywhere.

You know, if you have multi-threaded execution, you eventually have to make sure that your logging is lock-free, even if you don’t think it’s possible it could deadlock.

Our application basically dumps each log message to a threadsafe but lock-free linked list, then flushes the list to file every x ms.

A scheme like that can make quite a difference in a multi-threaded app that needs to log frequently, compared to a naive scheme that synchronizes on each log call.

There is nothing worse than poorly implemented logging code! We log alot as we are doing data transporation and ETL. We treat logging as the cross-cuting concern it is an apply it transparently using AOP. Logging code does NOT belong in your core logic and only poor or lazy developers haven’t sorted themselves out

Argle! The add I got at the bottom of this post was for Lighthouse.
To my understanding, this is a competitor to FogBugz… -_-’