Testing With "The Force"

But those comments have been entered by users who don’t have any markup available. By introducing ‘code’ as possible markup don’t most of those things cease to be a problem? Surely they’ll just write ‘7*x’?

Angry? Not really, Jeff.

And I would argue that they aren’t edge conditions, and that really is the crux of the point. Such scenarios are blatantly obvious, I think, if you just stop and contemplate for a few moments.

As an aside, make your implementation in F# and you’ll have an interesting (and likely very fast) implementation to talk about. Regular expressions are one of the greatest solutions in software development, but you know that trite old saying – now you have two problems. It isn’t super ideal for parsing text in such a manner.

What’s with all this testing crap? Remember: We Make Shitty Software… With Bugs!

http://www.codinghorror.com/blog/archives/000099.html

Unfortunately, one completely ambiguous case is thing like identifiers surrounded by double underscores such as FILE and LINE. If it were up to me, I would only use asterisks for bold/italics, and not underscores.

Of course, the easy workaround is to do __FILE__, but to do so, you need to be aware that double underscores mean italics.

Great post! Brute-force testing makes a lot of sense, specially for small components of a software. For larger components brute force might be a problem, either because it takes a lot of time or there is no possible way you can cover a lot of ground in a reasonable amount of time. Testing stuff at a lower level is a must.

Is just me, or is this particular blog not generating a lot of comments.

Don’t get me wrong Jeff, if I have to search and replace more than one thing in any of my files, I do it with Codewright’s version of regex - like most things in coding, no cares but me.

While I see the point Dennis is making, I often go with Jeff’s approach for testing.

It is the same thing as using the compiler as a test. Officially frowned upon, but if your in an interpreted language or working on something that takes seconds to compile why not use the compiler as a sanity check?

Heck my IDE of choice, Netbeans, does that automatically now.

Ran into this yesterday unexpectedly on Server Fault. I was trying to chortle but I wound up doing it in italics which was very deceptive. I thought *'s around a word meant some sort of noise your body might make. At least that’s how it was in mIRC in the mid-90’s.

Such scenarios are blatantly obvious, I think, if you just stop and contemplate for a few moments.

So what? If you’ve got a lot of test data available, why not run your code against it to see which, if any, blatantly obvious things you missed?

Dumb approaches, smart approaches, they can all help, and they can all miss things.

Erm, would just using a Markdown library not have made more sense?

“First, let’s make sure we have at least one non-whitespace character before and after each asterisk.”

I don’t think that’s what you mean… what you wrote would highlight “mystrings”, but not “my string is”

sigh

http://www.googlefight.com/index.php?lang=en_GB&word1=%22Jeff+Atwood%22&word2=%22Dennis+Forbes%22

@Paul-

So what? If you’ve got a lot of test data available, why not run your code against it to see which, if any, blatantly obvious things you missed?

Tests are grrrrr-eat.

Tests are not a substitute for actually thinking about the problem space, however, and this post supports a theory I’ve long had that tests sometimes make people produce worse code, because they get used as a substitute for purposed coding.

I strongly suspect that Jeff took dramatic license in his post, and he didn’t really go into it so blindly, but it served to demonstrate the use of test data.

@Juan-

Google fight? Dude, I’m an unknown nobody. Jeff smacks me down and calls me Sally when it comes to internet popularity.

I wonder how many real italics attempts will now be lost by the updated expression?

Why is is it easier and less confusing for the commenter to write italics rather than italics ?
Especially when we eventually have italics #bold# @underline@ %hidden% etc.

Being able to “dream up” edge cases demonstrates that you understand both the problem and the algorithm, along with limitations.

My first thought on seeing your initial regex, was the fact that a comment wouldn’t be able to contain multiple italic blocks:
test this in italic
test this other thing in italic

With a greedy algorithm — (which most regex engines are) — the block “other* thing in *italic” would be italic.

Essentially what you’re doing is trying to use a regular expression to match a context-free grammar, which is not possible. You can fake it to a limited extent using an “enhanced” regex engine (with back-references, etc) and some creative expressions, but the level of nesting that you can support will be finite, and the deeper you go, the more complex the expression becomes.

Either put a rich text field on your form (so it acts like a mini word processor and hides the markup in the background) or learn to live with the fact that users will have to enter markup into a plaintext field in order to have their plaintext appear to be rich text on the web page.

Oh, wait, text and text is SOOO much less ambiguous and easier on the eyes of a layperson.

Could you just have commenters type stuff I want italicized?

You should have looked at the WMD code! I could have sent you some snippets where this sort of thing is handled.

If you’re looking for a regex to do X you should start by looking at http://regexlib.com/ to see if someone already has done X. Most of these situations are non-unique, so reinventing the wheel can often be saved. And of course, if you don’t find something that suits your needs, or that what’s there is full of bugs and you know how to fix them, by all means submit the new-and-improved regex.

I know you love RegEx, but isn’t this the wrong tool for the job? Shouldn’t you be writing a parser for your reduced version of Markdown?

This shouldn’t be too time consuming as:

You only want three elements of the complete language.
There are probably open source parsers for Markdown you can rip off/learn from.

And you still get to test it against heaps-o-data.

Win Win?