Testing With "The Force"

codinghorror · July 7, 2009, 12:00am

Markdown was one of the humane markup languages that we evaluated and adopted for Stack Overflow. I've been pretty happy with it, overall. So much so that I wanted to implement a tiny, lightweight subset of Markdown for comments as well.

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2009/07/testing-with-the-force.html

rollercoster · July 8, 2009, 12:00am

What the heck should googlefights want to show?

try
http://www.googlefight.com/index.php?lang=en_GB&word1=Jeff+Atwood&word2=Adolf+Hitler
even a wrong spelled name can win
http://googlefight.com/index.php?lang=en_GB&word1=jeff+atwood&word2=joseph+stalin

And if you try to compare him against somebody in informatik which has really done something (like developing semaphores, winning the turing award and so on… guess who would win?)
http://googlefight.com/index.php?lang=en_GB&word1=jeff+atwood&word2=Edsger+Dijkstra

The_Skipper · July 8, 2009, 12:00am

I usually enjoy your posts, and I am a fan, but this one was just a dumb waste of time. Seems like you were trying to fill a quota on this one.

Paul · July 8, 2009, 12:00am

As this entry demonstrates nicely, Markdown IS the dark side:

-insufficiently unique tags
-closing tag is the same as the opening tag

…thus necessitating a regex that’s ridiculous even by the standards of regexes.

“italic” is hardly onerous, as jasonmray notes above, and stripping out all HTML except , , and is a solved problem.

Paul · July 8, 2009, 12:00am

Naturally, I meant “i” or “” or whatever magic incantation it takes to make angle brackets show up here. I trust the irony is not lost on anyone.

Sandro · July 8, 2009, 12:00am

May I ask what the difference is between .* and .*?
I just used a regex today (not something too common in my work) and am now curious

JuanZ · July 8, 2009, 12:00am

@Dennis Forbes:
"(somehow CodingHorror got on my iGoogle page, and I’ve been remiss to remove it. And every now and then I expand one of those nodes…)"

If I don’t like the service in one place I don’t go never again, period. For someone who does not like this blog you have been here for a long time (http://www.codinghorror.com/blog/archives/000845.html). Maybe you are a stacker, maybe you are jealous, or maybe you have a crush on him. Either way it’s kinda scary.

DennisF · July 8, 2009, 12:00am

Yes. It’s terribly scary, Juan. And from looking at the cadence of your sentences, I have to think that you like to injure small animals for fun.

As much as I enjoy ridiculous allusions and accusations, and as utterly in-awe I am of your amazing Google-fu, I find it humorous that you pointed out that I had commented on that particular blog entry, given that I did so after Jeff commented on one of my entries.

Though I’d been visiting Jeff’s blog – there have been some very enjoyable reads over the years – since a lot earlier than 2007.

Maybe, just maybe my dear friend Juan, the online development community really isn’t all that big, and we find ourselves in surprising intersections all the time. Strange how that works.

kbiel · July 8, 2009, 12:00am

This does basically the same thing, covering a few more edge cases, using the more advanced features of the .NET regex engine:

(?(?(?${Phrase}

Fabricio · July 8, 2009, 12:00am

Mr. Forbes is not wrong at all, as he’s not bashing at Jeff’s testing - but criticizing HOW he is conducting the implementation of the feature.

And for the googlefight results, the ones he’s winning is because of probably being links for the Forbes magazine or related material.

kbiel · July 8, 2009, 12:00am

Comment filtering failure. I’m not going to attempt to find out how to fix it without a preview option.

TooM · July 8, 2009, 12:00am

Or You could just make it optional, and then you don’t need to worry about it …

kd1 · July 8, 2009, 12:00am

Your regex implementation doesn’t have \b apparently - which is for word boundary. Rather useful

David_W · July 8, 2009, 12:00am

I, like most Unix shell hacks are pretty good at regular expressions, but I’m not sure whether you want to use them here.

Besides, why are you writing your own parser when you have textile that does it for you? (See http://textile.thresholdstate.com/).

As an added benefit, you’ll be using the same sort of syntax that other sites use. Italics? Use underscores. Bold? Use asterisks. Underscore? Use plus signs. Crossout? Use minus signs. Why should I have to remember your site uses double underlines when everyone else uses single underlines.

And, if you really, really insist on using C#, you’ll be happy to here that there’s a .NET version of Textile: http://www.codeplex.com/textilenet

JeffreyF · July 8, 2009, 12:00am

As I read the post, I expected a pithy moral about not trying to impose new semantics on old free-form data written without knowledge of those semantics. The question of how to (or whether to) write a regex for this should come after more important questions such as whether to do it in the first place (and for older comments written when a star was expected to be a star, I’d say no).

Of course, this all begs the question of why Yet Another Random Markup Language needs to be developed. Are html tags so foreign to your visitor base that you have to come up with something new?

Geo · July 8, 2009, 12:00am

Woaw, you’re the king of regex !

DennisF · July 8, 2009, 12:00am

A while back there was a post somewhere out there in the tubes by a guy advocating test driven development. In his case he was building a sudoku solver, and began his development process, per TDD dictums, by building his tests.

He then pursued the development of the application in the most mindless, ridiculous manner possible, basically just believing that the best approach was to kind of randomly change things until the tests passed.

I feel the same way about this entry. Did you really have to “go to the tests” to realize that there were egregious gaps in your matching? I would say that it is close to one of the worst ways to approach the problem.

Dan · July 8, 2009, 12:00am

Yeah, gotta start reading that book you talked about, those regexp actually make Japanese appealing

codinghorror · July 8, 2009, 12:00am

If Dennis Forbes is angry with me, well … it must be Wednesday!

I can’t speak for you, but for me, it’s way harder to sit around and dream up all the edge conditions than it is to, y’know, throw a bunch of data at it and see.

alt · July 8, 2009, 12:00am

If backticks signify code, how would I write shell code that contains backticks?