Size Is The Enemy

CptBongue · December 24, 2007, 12:00am

Thanks for sharing this.

Go try rewrite it in ruby… I look forward to your post after you’ve tried… maybe then you’ll make more sense

MarkS · December 24, 2007, 12:00am

Sorry about that last line of “editing scrap” in my previous comment.

Hutch · December 24, 2007, 12:00am

I still remember my Introduction to Java course…

“I have to write WHAT to print ‘Hello World’ to the screen?!”

Ben_Combee · December 24, 2007, 12:00am

One area where I’ve definitely seen inexperienced programmers go wrong is knowledge of a language’s standard and not-so-standard libraries. I mainly work in C for embedded Linux systems in my current job, and I’ve found plenty of places where using some extra function from GNU C library or glib has made things clearer and let me eliminate hundreds of lines of code that another developer wrote in our system with code that’s been tested by a lot more people than our own group.

she · December 24, 2007, 12:00am

500.000 lines of java code - no boilerplate can excuse that for writing rather simple code. Its not as if rockets are using these 500k lines …

And I would run away screaming from that quickly.

And, to the guy who seems to await a rewrite in ruby - one reason to USE ruby is to NEVER come to the need to have a 500k lines of code example ever, but once you want to write it, you must realize that some things are not yet possible in ruby since a few bindings are missing. I am sure Java has more to offer in this regard, and while you could write the bindings on your own, it would probably take quite a lot of time as well.

A better idea would be to use rubygame (SDL bindings).

gwenhwyfaer · December 24, 2007, 12:00am

C. Wissing: Prove your assertion.

A few years ago, someone conducted a study where they set an identical task to a number of groups, each using a different language. Static, dynamic; interpreted, compiled; batch, interactive - all were represented.

The only significant difference in the results lay across the interactive / batch divide.

What that tells me is that what is crucial in language development is an instant feedback loop. The longer you have to wait for feedback on your code, the longer your development will take, and the buggier it will be. Once you have a zero-length feedback loop, everything else - static typing, implementation technology, environment style, whatever - is just window dressing; in the long run, it doesn’t make any real difference.

Put another way - we are all aware of the need for instant feedback in user interfaces, whether CLI or GUI; whilst batch processing still has its place, interactivity - as much interactivity as possible - is what has made computers truly usable. Why, then, are most programming languages and environments - even Haskell, for heaven’s sake! - still mired in the tarpit of 1950s-style batch-everything? Why, when we already know it doesn’t work for end users, do so many proclaim it as a virtue for development?

kbob · December 24, 2007, 12:00am

Just a nit: APL is the wrong example. APL was known for achieving a code density of 50-100x the Fortran it replaced, without golfing. (Golfing was common, though.) APL was actually a very high level language in an age when we hadn’t really figured out much about languages.

Cobol is the canonical over-verbose language of the '60s, though PL/1, Fortran, or RPG would serve as well.

I am totally on board with the dynamic languages, though. I’m doing my first professional Python project after 25 years of C and C++, and the amount of code I can keep in my head is phenomenally higher in Python.

runbei · December 24, 2007, 12:00am

I’m just a simple end-user (since 1982). I’m also a professional writer. And I can tell you that computer were not only more fun, but more engaging in the days of WordStar. Back then, we had tools. Now we have ponderous, patronizing mega-apps that are designed to take “difficult” decisions out of our hands. Well, people still use and love Vim. Go figure.

Dave_Kirby · December 24, 2007, 12:00am

Contrary to what some commenters believe, it is certainly possible to create and maintain large programs in Python - I am on a team working on a successful commercial project with 250,000+ lines of Python (excluding blanks and comments), and it is still manageable. The equivalent program in Java would probably be 1 million+ lines, and far less manageable.

IMHO the ideal program would have each line of code clearly expressing one concept - if it takes multiple lines to express a concept then the reader has to read through a lot of chaff to find out what the program is doing at that point. If there are multiple concepts crammed onto a single line then the reader has to work to untangle them - this is the difference between the compactness and succinctness. Python is one of the best languages I know for getting close to this ideal. Java on the other hand forces you to be verbosely loquacious and repetitively repeat yourself over and over again in a repetitive manner - compare the Java and Python versions of “hello world” (http://www.ferg.org/projects/python_java_side-by-side.html).

Part of the reason Java is so verbose is because it offers the “safety” of explicit static typing. Ironically virtually every substantial Java program in existence also contains some dynamic type checking. Every time you use a cast (which means that before 1.5 every time you used a container class), or every time you use reflection (e.g. every time you use XML to configure your Spring framework) it has to do the type checking at runtime. This is no different from how Python works. If dynamic typing was as fragile as some static typing proponents would claim then these Java programs would also be fragile.

ReinierZ · December 24, 2007, 12:00am

Wow. This is all very very wrong. You seem to have confused ‘static typing’ with ‘explicitness’.

Huh? Static typing has very little to do with either boilerplate or big code bases. Look up ‘inference’, or have a look at what e.g. a language like ‘boo’ does. Java is very EXPLICIT, amongst other things, about its type system. Its type system isn’t even particularly static; for example, all types in java carry an implicit Maybe (null), whereas other more expressive type systems, such as Haskells or Scala’s, don’t have that dynamism, and those languages are better for it. Static is very very important for big codebases. It’s BETTER than dynamic, much better. Explicitness, especially pointless explicitness, is no good. Java has plenty of that.

The issue is not lines; it’s concepts. If you have no better metrics available to you, the metric ‘code lines’ will have to do. There’s certainly a relation. However, usually, you can do better. Example: The notion of where you put your brace (same line or next line) has zip squat to do with code complexity. And yet next-line-brace is capable of inflating your raw LoC by a factor of 10% or more. a keyword such as ‘extends’ is arguably easier to read than a symbol like ‘:’. Mostly, it makes no difference. And yet one is 7 characters, whereas the other is only 1. Being more explicit about the things you are dependent on (import statements) is generally a good thing. Global namespace languages like e.g. PHP need far less import statements, but for obvious reasons, having a global namespace is an absolute disaster for large projects. And yet one (global namespaces) allows elimination of massive amounts of LoC. Don’t go there. Do not assume that every LoC eliminated is a win. That’s not how it works. I’m not talking about golfing here; I’m talking about the vast space between golfing and oversize code. The area where LoC reduction just doesn’t do very much.

An important distinction: there’s libraries. If the LoC of libraries in included in the count for projects, most of us have worked with 1 million LoC+, which leaves Yegge’s measly 0.5 megaLoC in the dust. Yet, no one seems to have much of a problem with this. Clearly then, it seems possible to abstract away entire bastions of code and take them out of the LoC equation. I won’t go into detail on how this is done, and the many many pitfalls that await if you try to pull this off with your own code. I’m merely trying to state that you’re dead wrong:

LoC is NOT THE PROBLEM. Code Complexity is.

My second problem with your article is the needless hyperbole.

I recently went toe to toe with Slava Pestov of Factor, writing a json marshaller/unmarshaller, he for Factor, and because I like a challenge, I decided to use java. I ended up with a LoC 15% higher than he did. Not good, but not even close to magic 50 to 75% reductions claimed - in this case, by yourself. We also finished in about the same time, incidentally.

I’m going to need to see some evidence for quotes that switching programming language can magically reduce Yegge’s LoC by 50%. Remember that ANY disciplined rewrite, even from language X to the exact same language X, will eliminate a large amount of code simply because you now have some serious hindsight backing you up. X-to-X also makes it easier to do it in small steps and have lots of instant feedback about it. Refactoring is fun for a reason. It’s also a lot easier to do in statically typed languages, incidentally.

The advice you’re giving here is just… wrong. If your code base becomes unmaintainable, which is absolutely possible, as no amount of IDE goodness will scale to infinite amounts of complexity, you need to work on making it more maintainable. It’s really that simple. One way to accomplish this is to revisit the -structure- and see if you can’t eliminate lots of duplication. This might be harder in some languages versus others, but it’s possible in most of them (Yes, including java). The blunt stick of ‘total rewrite’ should be used only if you have no other alternative; it’s far, far less efficient compared to just improving your existing codebase. You should also use some tools to draw out a dependency tree (a thing which isn’t easy at all in DMP-heavy languages like python or ruby or javascript, I might add, and trivial in something like java - static typing is WIN for large code bases, not loss, like you insinuate). Draw a big circle around something that looks like an island. Complete the abstraction and offload the entire thing into a separate project. The goal is to sever all dependencies that lead out of the island, and have only a few dependencies into it. Code analysis won’t tell you the whole story (as I’m now guilty of oversimplifying some, there’s also logical dependency to think of), but it’s the right idea.

You can spin that whole island off to a separate group and if need be, let it be maintained by a separate team. You can even split this island into even more islands.

And, of course, if there are no islands to be found? I suggest praying. Or the scream, shout, and run, run, run away procedure.

THATS the stuff you need to be doing if your code is running away from you. Switching programming languages is a massive effort for little gain. Should you -start- with java if you are beginning from scratch? Probably not. Should you rewrite your X-megaLoC project into something else because Yegge and particularly you felt like pissing all over java? No - that’s stupid advice. Add abstraction, carve out some libraries, but most of all, reduce dependencies. There’s no set formula for any of it. Nobody said programming was easy.

Daniel · December 24, 2007, 12:00am

I wholeheartedly agree that it is insanely easy to get into code bloat with Java. It may work very well in college courses to show OO techniques and implement algorithms, but in real life there is just way too much work. In .NET they seem to be acknowledging this while still favoring static typing and thus we see the absence of checked exceptions , the inclusion of type inference and a bunch of declarative/functional stuff. Which is one reason why I much prefer working on Mono than Java.

ChipO · December 24, 2007, 12:00am

I read the original long post (worth the time, IMHO) and didn’t comment there, because I thought the author was taking the topic a little too personally. Maybe I am too.

I once worked in a large (hundreds of engineers) development organization where there were two types of developers: those that worked on the eight million line legacy code base that actually generated, directly or indirectly, about one billion dollars a year in revenue, or those that worked on smaller projects, the majority of which sucked money out of the company with the hope that they would eventually be profitable. During the decade I was employed there, I worked on both sides of the fence.

How do you get eight million lines of code? One line at a time, baby, one line at a time. No one sets out to write eight million lines of code. There are no simple answers of how to address this issue.

That code base was a victim of its own success. It didn’t start out at eight, or even one, million lines of code. But it was a successful product. Customers wanted new features. Lots of new features.

You had to have hundreds of developers working on dozens of teams to develop different new features in the code base, because you can’t be competitive otherwise without developing the features, and there’s simply too much work for a small team to do.

Fall behind in feature development, and customers migrate to another product.

Give up that code base, and the company goes out of business, and thousands of employees all over the world lose their jobs.

Rewrite? So you go to a product manager and say “We want to rewrite the code base. It’ll cost millions of dollars.” She says “What difference will the customer see?” And you say “Well, if we do everything absolute perfectly, nothing.” And the manager says “So let me see if I understand you: you want to spend millions of dollars so that in the absolute best possible circumstances (that are unlikely to occur) the customer will see nothing?” Those projects never get funded.

The answer is that you rewrite/refactor to reduce maintenance and future feature development costs, thereby reducing what the finance guys calls Cost Of Goods Sold or COGS. But the effort to reduce COGS cost millions of dollars over many years to have any observable financial effect, while funding new feature development pays off in months. Short term thinking says screw the COGS reduction, go for the low hanging fruit.

I’ve worked with developers who said “I choose not to work on a project of that magnitude”. That’s fine career wise, although you’re saying you won’t take jobs that make up probably the bulk of software developer employment.

But even so, if everyone did that, there are a lot of products you wouldn’t have, because they can only be developed in a timely manner by a large organization (even if they are open source projects). I’m guessing we’d have to do without Linux and GNU, most of Apache, Windows in all its incarnations, MacOS, the entire telephone system, most of the Internet and the web, a huge part of the military/industrial complex, infrastructure stuff like Oracle and SAP that behind the scenes reduce the cost of manufacturing, etc. etc. etc.

“No silver bullet.”

rubix · December 24, 2007, 12:00am

I am surprised that nobody here has mentioned “library support”.
For any serious work, I would any day pick a crappy language (whine, but still live with it) as long as it has good library support. For instance in Java I can find libraries to do 95% of the painful tasks that otherwise (however easy) I would’ve had to write myself.
bonus points if the libraries are open source.

Though it would be fun to write some of the libraries yourself, it is not always that you have time on your side. On a crazy project schedule, only reuse can save your a$$. This is the sad reality.

Does “x=(insert your fav lanugae here)” have a connection pool library for HTTP and DB connections (that retries, pings , cleans up dead connections?). Does “x” have a good GUI library thats cross-platform and has good RAD tools - very important for a GUI based project.

btw, I have been programming for the last 15 yrs. all the way from assembly to Java and now to Javascript!
Most of the time I’ve stuck to langugages that’ve helped me complete my task - fun or work - it does not matter.

Very few times I’ve had to write stuff up that did not require support from external libraries - infact in my professional career I dont remember any! shell script is powerful because of the tools that unix gives, remove them and hey life is not all the comfy anymore.

There sure is a difference between using a well tested library vs writing it from scratch and maintaining it. Hey you can’t complain about code bloat if you didnt use the right libraries!

tndalpaul · December 24, 2007, 12:00am

That’s why I always use Prolog - once the problem is defined I’m largely finished. It can’t be much shorter (while remaining understandable) than that.

mreiland · December 24, 2007, 12:00am

Mark, your advice to use inheritance based polymorphism is exactly why C++ has the rep that it has.

If you tried that with any codebase I was working on, I’d have to fight the urge to backhand you.

James · December 24, 2007, 12:00am

No one is mentioning that EC4 has strong static typing (if you’d like), and IIRC Steve is all about using it. The dynamic nature of the language he was talking about IMO was first class functions and the like. In fact, EC4 comes with some really cool type-based features that are worth checking out (e.g. the new explicit type system and the ‘like’ operator).

Consider:

type Point { x:int, y:int };
function takesPoint( aPoint like Point ):void
{
// inside aPoint is guaranteed to have members x and y
}

takesPoint will accept /any/ object that has a member x and a member y that are of type int and it will return nothing. The compiler can do this at compile time and it will also be checked at runtime (causing an exception if the contract is not fulfilled). Think about it – interfaces without requiring explicit implementation of the interface.

Read the EC4 spec — there are some gems in there. ‘wrap’ is another cool feature in the same vein as ‘like’. I can’t wait for these features as they will significantly reduce code and improve readability and keep safety.

Jack37 · December 24, 2007, 12:00am

Firstly:

for line in file(“FileName.txt”):

for (String line : new LineReader(“FileName.txt”))
//process line

Your code is an inextricable monstrosity. I would hate to see the special considerations put in place here.

Can I:

for line in url(“http://www.stupid-newbie-programmers.com”)

Can I? If not why not? It makes no sense.

Yes, Java could do with a few more core libraries to do a number of things. The simple fact that these new development tools merely have some weak typed string manipulation features that exist in something as torrid as Javascript is no surprise.

Boilerplate code means you are constructing the behavior of something from code, not from syntax. You can reduce and reuse code, exactly like I have shown. What is the difference?

However, you could also do:

for (Url url : new UrlReader(“Bookmarks.txt”))
//process url

for (Url url : new XmlUrlReader(“Bookmarks.xml”))
//process url

for (Url url : new DeliciousReader(“username”, “password”))
//process url

for (Image image : new FlickrReader(“stupid blog”))
//process image

And then you realize that this isn’t some shortcut, hacky looking syntax crap that hides the very nature of the operations from your view.

Go on, handle each line of a file as a Url, do it, show me the code, go on. Show me the damn code. I dare you. now show me the code to handle the image in that way. You can’t because you don’t have a syntax shortcut to handle it.

Why is the CAPTCHA word always ‘orange’ for me?

Runar · December 24, 2007, 12:00am

Java’s problems are legion, but the fact that Java is statically typed is not one of them. The clunkiness which you attribute to static typing is actually because:

Java is not typed strongly enough, and not consistently. What’s the type of the expression null, for example? Why do we have variables and also objects, which are themselves variables (i.e. they’re mutable)? Why are Strings an exception to this?
It does not allow higher-order functions or continuations (even Pascal and C have this).
It doesn’t allow type inference or higher-kinded types.

Haskell, Clean, and ML are statically typed languages with none of the clunkiness of Java. There’s even such a language (Scala) that compiles to the JVM. Another (F#) compiles to .net.

Davide17 · December 24, 2007, 12:00am

I think that Lines Of Code (LOC) doesn’t mean much. Of course it is a statistic that can be measured but it doesn’t tell you anything about the Quality of the code in question.
If the code is of a good OO design and actually reusable you might have something like:
A Business Rule Engine, a Report Engine, a SmartForm U/I Interface, Print Engine, and so on.
Every project is justification for creating reusable objects that become part of your toolbox for your next project.
OR
You create the world’s largest ball of string in code and run away from it when the project is finished.

WesleyS · December 24, 2007, 12:00am

Well, I guess I’m in trouble. I just ran a line count and I got 498,234 lines of code. While I didn’t write every line (actually about half are coming from my code generators), I am the sole maintainer of the code.

Worse, the code is a mishmash of ASP/VBScript (ugh) and ASPX/C# (for those parts I have been able to update).

Perhaps I should just take the cyanide pill now? Or maybe I can do what I have always done; treat the database as king, keep the business logic isolated and keep migrating as necessary.

As long as the number of changes remains within my reach, I don’t really mind the huge scale of this project. In fact, the amount of functionality and diverse customer usages is a point of pride for me.