Programming Is Hard, Let's Go Shopping!

But what the hell do I know.

That is what I’ve been trying to figure out for years.

a programming community that doesn’t suck.

And a big part of the reason it doesn’t suck is that people can format their posts almost as much as I formatted this blog post. Either by a) taking the time to learn Markup or b) relying on the tried-and-true HTML that almost every developer now knows by heart.

The average posts on Stack Overflow just plain look better than other forums and sites.

I agree allowing HTML was painful, but in a good way. I’ve grown to enjoy the flexibility of using either markdown or HTML interchangeably. It means when programmers first encounter Stack Overflow, their first instincts in editing a post – hmm, how about if I enter a hyperlink here – work exactly the way they expect them to.

Choices about markup code are as critical to us as they are to, say, Wikipedia. It’s all about the content, and making it easy for users to do the right thing when entering content.

Wow, so many comments about how there are libraries in .Net to sanitize HTML and no one mentions what these libraries might be. Which is Jeff’s primary issue.

I agree to a point, though I have to say it really depends on what you consider core to your business.

If for example your business depends on an API allowing other developers to create an ecosystem around your services - should you roll your own XML parsers? In this exampl parsing XML is as core to business aims as parsng posted content is to you, and it’s why I think you’re off the mark a little in this instance.

You seem to have done the research and identified that there is a gap for the HTML sanitisation which you needed to fill with in house code. That much, I totally agree with. I’m not sure I’m 100% with you’re analysis that this represents a core business function though.

In .NET? How?

This is like telling me I should use rainbows and cotton candy. Well, obviously.

The stress has obviously gotten the man. Why didn’t you even read the whole post? Or couldn’t you yourself come up with a way to use python from .net?
And no response of course to these questions…

Usually the content here is quite good but this is a severe case of NIH-syndrome and denial wrapped together.
Sanitizing html is your core business now? Nice.

I have to massively disagree with you on this one, particularly in the games industry.

Middleware in our industry is quite common, hence Midway paying Epic 3.X Billion dollars for a 10 year deal for their rendering engine. Particularly, the product in our industry is about providing content, not code. In which it makes sense that if it’s cheaper to buy the code to help produce that content (and content production could be faster as a result), then that’s the proper path to go.

I do agree, that as a programmer dealing with those systems, you need to understand it enough to be able to reproduce it in some variety. But at the core, if you can get a contract to purchase that software to do the job better/faster/easier/less expensive, then you should do it. Otherwise, you’ll end up overbudget/slower, which is not productive to a video game.

~Main

If it were only a time vs. materials world: reuse, reuse, reuse.

But its not. Computer Science curriculum would agree with that. Thats why in a data structures class, any good professor will make you write a stack/queue/linked-list before directing you to the STL. Thats why in my Web Programming class this semester, we were directed to write a web server in C. The time it takes to write these things is worth it (at least to me) if for nothing else than the personal growth I experience when I learn something.

When Jeff’s sanitizer breaks, he’ll know how to fix it a lot better than if he just had this big abstract entity of an HTML sanitizer to search and prod through.

Thats why in my Web Programming class this semester, we were directed to write a web server in C.

You’ve inversely proved my point with your own :wink:

A web server in C is not a 2.4million line codebase that 20 people have all contributed to over the past 4 years. Video games are.

If it only took a weekend to write a code sanitizer, then yea, do it yourself. But writing a robust multi threaded physics library that works cross platform and cross project? I’ll leave that to another company who’s entirely dedicated to providing that software in the marketplace as their lively hood.

Good programmers program, Great programmers reuse.

~Main

Hey Jeff,

I’m a PHP developer, and there are loads of libraries, frameworks, CMSs etc out there. I’ve just (hopefully) won the battle with my project managers to let me write my own stuff instead of having to use 3rd party code.

Why?

So much of the ‘popular’ code (CMSs especially) is really really badly written. The time taken to evaluate, test, debug, tweak existing code is often way longer than doing it yourself.

I ALWAYS re-invent the wheel. How else would the wheel get getter? And who is anybody to say I’m not good enough to add incremental value to the wheel?

That’s not to say I don’t use built in stuff and code that I know is good. Sometimes I’ll take a snippet from the web and RE-WRITE it so I understand what it does and how it works. This is not a black and white thing.

As you pointed out, use 3rd party code where it makes sense, but don’t rely on it to build your software. Telling a client their site is down because of an exploit in 3rd party code is hardly going to enhance your reputation.

If there is a bug in any on my applications, it’s my fault, and my job to fix it. The buck stops with me, every time. That’s what being a professional programmer is all about.

Rant over. :slight_smile:

so many comments about how there are libraries in .Net to sanitize HTML and no one mentions what these libraries might be. Which is Jeff’s primary issue.

It’s because there really aren’t any. There’s the HTML Agility Pack – which isn’t really designed for sanitizing without writing a bunch of (error-prone) code to make it work – and that’s about it.

As others have said, the idea that sanitizing is this super-hard impossible problem is also not really true. Certainly nowhere near as hard as the physics library example @MainRoach proposed, etc. And like @Damian said, you can write a decent sanitizer in a few days.

Testing it thoroughly is another matter…

Okay, interesting article (and I’ve read that one of Joel’s in the past), but I think that you are both talking about something totally valid but making the wrong point.

The reason that you had to write your own HTML sanitizer, and the reason that Microsoft’s Excel team had to write their own compiler (Joel’s article, as I recall) is that libraries or external programs to do what was needed didn’t exist.

When that Excel compiler was written, there was no open-source community to speak of, and they couldn’t very well modify a commercial compiler for their needs. Google can’t use external libraries, because nothing scales to their level…yet. And as you pointed out in comments, .NET is a backwater and doesn’t have the kind of libraries yet that older platforms do, so the thing you needed didn’t exist.

The moral of the story isn’t We should write important stuff in-house. The moral of the story is, If it doesn’t exist or you need something way beyond what exists, you probably will have to write it yourself. That’s a fact of life, not a lesson in good software design.

-Max

Jeff, while I often enjoy your insights, you’re just not being rational here.

Core business function is synonymous with competitive advantage. If HTML sanitizing were core, you’d have written your business plan around how much better you are at it than others. It would be up there with how you attract and keep smart programmers on your site. The rest is plumbing.

If .NET doesn’t have its own sanitizer, perhaps it wasn’t the right choice for a platform. Personally, I’ve always wondered why you chose .NET. I know it’s the one that’s most familiar to you. That’s a plus if you want to complete a small new project fast, but using the same language and tools all the time limits your growth potential as a developer. Considering how many readers you have who don’t use .NET (myself among them), you really don’t want to end up as one of those curmudgeonly single-language programmers.

I don’t know about the whole html sanitizer being a core business function thing (quite frankly, I could live with textile or bbcode).

But I do think it’s a good investment for a developer to reinvent something. You can’t talk about scalable comet architecture if you’ve never wrote a server. You can’t talk about javascript compilation optimization if you’ve never written a javascript engine.

If you want to be an expert at something, you’ve gotta experience its ins and outs, the full development cycle, the bugs, the caveats, the holes and the limitations.

And if you are a programmer (read: not a content-entry monkey) and you have bills to pay, you’ll probably want to be good at some programming-related task.

How do you determine that there are no existing libraries? Do you just google or do you have a list of sites (sourceforge, cpan, …)? Every time I hear developers say this type of thing with certainty, I suspect that I’m faking it as a dev, since I never feel sure of what’s actually out there, even after spending a lot of time researching…

Same goes for outsourcing, keep the core competencies that you rely on in-house.

Markdown sucks.

Markdown interprets all text between two underscores as italic. This would be fine if nobody needed to use underscores. In other words, this:

Popular Apache modules include mod_php and mod_rewrite

shows up like this:

Popular Apache modules include modphp and modrewrite

You can escape underscores, but this defeats Markdown’s stated purpose of appearing like natural text.

I should add: my preferred alternative is Textile, or Mediawiki formatting, or no formatting at all. Do users really need HTML to comment on a blog post?

Markdown interprets all text between two underscores as italic.

Agree. We changed this so intra-word underscores are not allowed in our Markdown server-side parser.

More here:
http://blog.stackoverflow.com/2008/06/three-markdown-gotcha/

well, it’s good you can still enjoy coding.

But I will be more than happy if you explain some tips to manage your time for both programming and writing blogs (with such entries) :smiley:

Trevor on October 17, 2008 12:51 PM took the words out of my mouth.

Reinvent the wheel (not the concept, but the instance) to become a better programmer.
Learn to code by writing code. You will not understand all the risks and pitfalls of a HTML sanitizer if have never written one.

I do not say you should never reuse code. But rolling your can definitely be the best option.

And I think that ‘there is not suitable code available’ definitely is a good reason to do what programmer (hopefully) do best: write code.