Regex use vs. Regex abuse

ChristianM · February 16, 2005, 12:00am

Reg-exps aren’t hard. They’re covered in ungrad courses on state machines. Not understanding the tool you are using can be dangerous. Letting a wizard handle it for you means you no longer understand what the code is doing. You end up doing cargo-cult programming.

What’s hard is using reg-exps to deal with fuzzy real-world data.

Your phone number example:

"^$*\d{3}$*( |-)*\d{3}( |-)*\d{4}$"

fails when confronted with a number with an international dialing prefix, or any non-US number.

Your domain example

"[^\\]+$"

fails when confronted with the e-mail address form (user@domain) or a forward-slash form (user/domain)

It’s dealing with all this that makes reg-exps hard. It’s why the RFC-822 parser is so huge. E-mail addresses aren’t always foo@bar.com.