When Understanding means Rewriting

beza1e1 · September 21, 2006, 12:00am

I remember a quote: The time necessary do understand code is equal to the time to write it.

Not useful in any way, but not far off the truth i think.

Ken · September 21, 2006, 12:00am

Dead on, Jeff. Like so many other things there is a big difference in seeing and understanding how something is done and actually doing it yourself. You can watch Tiger Woods hit a shot off the tee and you can basically figure out what he’s doing to make it happen. But can you really know how a golf swing works if you’ve never played golf yourself? Nopers.

Being a relatively new programmer I cringe when I look back even a year ago at some of the code that I wrote. But the act of coding various applications on my own has allowed me to learn and grow and gain invaluable experience that I never would have obtained simply by analyzing existing code, reading some coding books, etc.

I believe it’s not always necessary to start from scratch verses rewriting. But, I do believe it is in the doing that you gain the knowing.

Bill196 · September 21, 2006, 12:00am

Great post! I have found that with many of my projects I begin the coding, rework modify that code, and if I have time rewrite the code with a better understanding of how it should all come together.

JohnK · September 21, 2006, 12:00am

Joel is right – rewriting (as in writing new code from scratch) is usually a stupid thing to do.

Refactoring (as in: reorganizing the code, for example, to hide irrelevant detail) is also often overkill.

More than 90% of the time, simply manually reformatting the code slightly – adding white space, making sure everything is properly indented – is all I need to quickly get a feeling of fluency with unfamiliar code.

Tim_Dudra · September 21, 2006, 12:00am

I was going to avoid this thread as I knew I wouldn’t be able to read it without taking ill but, then for some stupid reason, I dove into it.

OmegaSupreme had me on his side here right up until the comment about hard to design. I have heard that argument from developers over and over and over again and it is frankly rubbish.

“I think we need some better UML style tools so we can get a “birds eye view” … because is hard to design it correctly up front …”.

Then Straevarus here left me cringing as he brought back memories …

“We might write our own code, then 1000 lines later, realize that we need to go back and change was we previously wrote”

… memories of 4000 line methods written by “the code guru” at my former employer; methods with a cyclomatic complexity rating well above 70 (when I stopped counting). Methods that contained switch statements nested inside switch statements - something that should NEVER be done as there is no excuse for that kind of cryptic stupidity.

The fact that one realizes 1000 lines later that they need to change something the wrote 1000 lines earlier is 9 times out of 10 (I know, cause I’ve done it) because they never stopped to figure out what they were building in the first place. Chunks of code separated by that distance should have few if any dependencies that would result in that level of change being needed.

Finally, as I was imagining bungee diving without the bungee, some sanity intruded into the discussion.

“Take a thousand (+ 1) classes on communicating, influencing, and negotiating with customers and managers and your coding will be reduced by 80%, guaranteed”. Wstephen

“The point, of course, is that software engineering, after 50 years, still has little in common with real engineering (sorry about your feelings). we insist on doing it different just because we want to.” Buggy Fun Bunny

These comments cut to the bone of the problem. The majority of developers seem to refuse to partake in work related communication with others; many sit in their cubicles all day coding and this they consider productive (regardless of how many times they rewrote the same code/classes because they never understood what the code/classes were really supposed to be doing).

For some reason, the software development industry tolerates if not encourages the hacking out of solutions without any thought process. Software is complex; this is true. But so is a bridge from Prince Edward Island to the mainland designed to withstand St. Lawrence River ice floes. So is a Pentium processor. The difference being that with the bridge, the engineers only got one chance to get it right. With the Pentium, they got more chances but magnitudes lower than developers seem to require to produce their “complex” apps.

Rewriting a function to understand it, IMHO, is a clear indicator that either the reader has no patience or the function was abyssmally written in the first place.

Have a nice day.

Wstephen · September 21, 2006, 12:00am

“If I had eight hours to chop down a tree, I’d spend six sharpening my axe.” ~ Abraham Lincoln. Task strategy doesn’t change, only people and context change.

If I rewrite code because I don’t undertand the original code, I am only prove my impatience. And I suggest most rewriting is done because we are unable to get management to understand just how much effort and time that is required to complete any particular task or project – so we just code to fill the time.

Take a thousand (+ 1) classes on communicating, influencing, and negotiating with customers and managers and your coding will be reduced by 80%, guaranteed.

JeffS · September 21, 2006, 12:00am

Great post that directly answers why Joel has NOT jumped the shark. Writing a relatively small amount of code for a basic compiler, to preserve a non-trivial code base - especially when it simultaneously resolves other platform/porting concerns - seems like an obvious good move to me.

iamnottellingyo · September 21, 2006, 12:00am

isn’t this why lisp wins big time? doesn’t it let u abstract away the details so that the ‘green’ portion of the pie chart shrinks (in direct proportion to your code’s lispyness)?

MikeD · September 21, 2006, 12:00am

Things like just how some threads synchronise with each other - and even which code runs on which thread(s) - really don’t just fall out of the source code like that.

I’m gearing up to write code documentation for our DOS-emulating library for Windows CE/Pocket PC. DOS-emulating in the sense that you take a program’s source code written in C for a Symbol series 3000 DOS hand-held computer and recompile it with eVC 4.0 and link to the library to get a Windows CE (GUI) application which presents a text-mode UI, with as few source code changes as possible. As you know, DOS was largely an ‘execute all these things in this order’ environment - highly imperative - while Windows GUI programming is event-driven. Our library also has to emulate all the things that were interrupt-driven on the original platform.

So in my library, the user program runs on the main thread, while another thread actually creates the main window and runs the UI, and there are a couple of other background threads for monitoring battery status and other things that Windows CE only offers via blocking APIs. This gets asynchronous screen updates. For a long time I actually used nested message loops to do it, but that meant you only got screen updates whenever some library function that blocked (e.g. waiting for a keypress, the equivalent of Sleep()) was called. Things like indexing a file would have to effectively Sleep(1) from time to time to show screen updates.

Unfortunately, moving the UI to another thread also means that input arrives on the UI thread, not the application’s thread. So there’s some pretty crazy synchronisation to ensure that the keyboard buffer is not corrupted by the concurrent threads and to unblock the main thread when a key is available. None of this is particularly obvious directly from the source code, even with some pretty big comments.

Basically, trying to learn something from the source code itself - depending on quality of source - can be like trying to work out where you are in the country using a 1:10,000 (or worse) scale map.

As for learning a project - I’ve been maintaining and enhancing an application server written in VB6 for the last three and a half years, and I still don’t know all of it. I only correctly learned some details of the protocol this year, and that’s with a document that purports to describe the protocol. Without that document, I’d never have understood it, or been able to fix some egregious errors - when splitting a message into multiple UDP packets, it sends a packet number and total packet count, and formerly assumed that the payload was a fixed 1500 bytes for all packets bar the last (being therefore too big for any network without fragmentation, due to an misunderstanding by the original programmer); I was able finally to make the packet size variable even within a message with only minor changes to the decoding code.

That said, there are some limitations of the VB environment which will eventually occasion a rewrite with some other language. Making a service can be done with the NTSVC.OCX sample from MSDN, but it’s not a good implementation of a service. We can’t multithread the main server EXE which limits scalability. Perhaps the answer is to move from having a socket interface (the Winsock control couldn’t scale any further) which plugs into a VB application, to plugging a VB COM component (using much of the existing code) into a C++ host.

JeremyD · September 21, 2006, 12:00am

Please for the love of $deity, can people please stop referring to -any- code change as “refactoring”?

The whole point of refactoring, as originally posited, is that the change improves internal quality while not changing external functionality.

Referring to “refactoring” when talking about rewriting an app is like calling static typing proof of correctness.

No! Bad programmer! Step away from the buzzword!

Mihai · September 21, 2006, 12:00am

Sometimes I want to rewrite my own stuff. And it has nothing to do with not understanding it.

My general theory is that a software that is 10 years old (or is at version 10) NEEDS a rewrite.

Because you designed it to do something, but then you added more and more features, sometimes in directions that you have never imagined.
Then it looks like a Toyota Tercel, with added spoilers, extra fuel tank, two more weels, an electrical and a rocket engine.

The second reason is because you learned a lot of lessons (hopefully). And languages and compilers changed, OSes changed (“I see, this is a work-around for a Win 3.1 bug!!!”).

So, maybe for marketing and executives rewriting is allways a bad idea. But not for an engineer.

prim8 · September 21, 2006, 12:00am

This is where test infection pays off. Want to learn the rules of monopoly? At a minimum the tests should cover the basic rules, and makes it a lot easier to understand the outward-facing aspects, and ignore the internal details for the most part. Have a case that isn’t covered in a test? Write a new one, if it doesn’t pass, then you need to dig into internals.

Still feel the need to rewrite something for clarity? With proper test coverage, then you can do real refactoring as needed. I only rewrite/refactor sections I can’t understand with a cursory glance. The only times I will rewrite a whole package from scratch:

If the code has significantly more bad-smells than good-smells, insufficient or non-existent test coverage, and I can’t even begin to understand what the developer was thinking.
When I realize a library has taken completely the wrong approach, could benefit from a clean slate and I want to change the external interface.

Reed · September 21, 2006, 12:00am

Nice 3D perspective on that pie chart – a classic method of influencing the viewer’s interpretation of the data according to your own bias, that is, it makes the “understanding” section look bigger than it really ought to be.

Aivar · September 21, 2006, 12:00am

Enter ye in at the strait gate: for wide is the gate, and broad is the way, that leadeth to destruction, and many there be which go in thereat:

Because strait is the gate, and narrow is the way, which leadeth unto life, and few there be that find it.

Peter · September 21, 2006, 12:00am

WOW, like Everquest, has mountains of broken code hidden inside it. If your hypothetical Martians tried reading the code, or worse, the comments, they would come away with a very wrong concept of how it works.

A recent EQ patch tried to “fix” the spell interruption that one gets from being pounded by monsters. The “skill” used to prevent the interruption was called channeling. Channeling + interrupts were broken since beta 7 years ago. Trying to retrofit the intended code into a live thriving system just about broke the system. Complaints from the player base ended up getting the “fix” backed out and the broken system restored.

Most of the “nerfs” or “bug fixes” involve players discovering what the code permits them to do, which is very different from what the developers intend. Field Marshal Helmuth von Moltke put it as “no plan survives contact with the enemy” (I would have guessed von Clauswitz).

So, for most games, actually watching players will derive a very different set of rules than the designers/coders intended. For your homework to prove this theory, one need only play Magic the Gathering, or Cosmic Encounters.

The running gag among evercrack players is “broken as intended.”

BobO2 · September 21, 2006, 12:00am

I have to disagree with the gist of this post. It is rarely better to rewrite code. Rewriting code is just the easy way out. One of my bosses/mentors once told me that the best programmers are maintenance programmers; not because its glamorous but because its hard.

As others have pointed out, rewriting an app from scratch introduces bugs and decreases functionality you weren’t aware of. How can you rewrite that which you don’t understand? That’s no different than trying to write a project with incomplete user specs.

I’ve been a maintenance programmer for the last 10 years, and there’s nothing as frustrating as having to support someone else’s poorly written code (ever notice how ANY code you didn’t write yourself is “lousy”). But if someone is paying you to maintain an application, then that means there’s probably some users somewhere relying on it to do their job. Rewriting their application because you don’t want to take the time to grok someone else’s source code may be in your (short term) best interests, but its not in the user’s (and they’re why we’re here in the first place, right?).

Now there are some obvious exceptions to this: if the application is as lousy as the source code (i.e. its not providing enough value to the user), then rewriting probably makes sense. Similarly, if you’re upgrading to a new platform, or introducing major new functionality, then rewriting is probably the way to go. But it should be the exception, not the norm.

Anything you didn’t write is going to look crappy at first, but until you understand exactly what its doing, and how it works, how good of a judge can you be? You need to take the time to absorb the code before you can really have an educated opinion on the code’s worth.

What I always do to get started is tackle one of the larger, seeminly important procedures/modules and (like someone suggested earlier) start cleaning up. Start formatting the code to fit your programming standards (personal or company): proper indenting, whitespaces, variable notation/capitalization, etc. Doing this seems anally retentive, but it helps you understand what the original author was trying to do. As you start to understand, comment heavily. Add your notes explaining what the code is doing, and more importantly, WHY the code is doing what its doing. Its also important to clean up any out-dated comments you find. After you’ve done this to a few pieces of source code, then reevaluate whether or not it should be rewritten. You’ll make a much more informed decision.

Gotta run, but two quick ways to protect yourself from confusion on your own source code: complete unit tests extensive commenting. These two items will help make what you write today less confusing tomorrow (whether its you or someone else doing the reading).

Take care,
Bob O’Malley

johnm5 · September 22, 2006, 12:00am

I agree with you Jeff, 100%. It’s not just what the code does but the individual developer’s style, way of looking at things and dissecting; then re-assembling the problem is so diversified that it sometimes makes it impossible to modify without extensively re-writing portions of it.

And do you know what? I’ve been at this off and on for 20 years and I have never seen two developers source code that was even remotely alike when it came to solving a complex problem.

I’ve even compared left handed and right handed coders work and they are as different from each other as well as the other left or right handed developers.

As the old saying goes ‘coding is an art’ and no two artist are alike…

NickH · September 22, 2006, 12:00am

I think that it is interesting that (In my experience) it is usualy the “OK” code that gets re-written:

Only a fool would re-write good code.
Only a genius can understand bad code well enough to dare to re-write it. (Unless it can be re-written from the spec.)
Hence the only code that you can easily re-write is the middling code.

Tim · September 22, 2006, 12:00am

Most of us have either written a particular code base or more likely have inherited someone code that was not documented well (if at all). Spending the time and understanding the code and determining what the refactoring of code will do is crucial. That prevents that amounts of subsequent bug fixes you have to do to address your original bug fix. In some cases, a bug that is reported is really a change in requirements. When your requirements change significantly enough that the current code is not doing the right thing, then at that point rewriting code is more of an option.

I have used some tools that generate a Kiviat Graph of the “inherited” code at my work and what I see in a general failure to follow ‘standards’. I typically refactor/rewrite code to follow the standards and document as I go. Not necessarily for me, but for the next poor schmoe that is going to inherit my code.

-Tim

Mihai · September 22, 2006, 12:00am

If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.

I keep reading this again and again, in a form or another.

Each new bridge is a rewrite of a previous one, with slight improvements!

If programmers would rewrite again and again a text editor, in 4000 years they will manage to make a stable one.
Just a bit of patience people!

(and even after thousands of years of bridge-building we can get it wrong, remember Tacoma