Regular Expressions: Now You Have Two Problems

Thanks Jeff… As a novice PHP coder, I’ve found myself in need of, and intimidated by, regexes time and time again. After reading this blog, I think I now have the courage to wade in at full steam and make use of this useful and misunderstood tool…

I agree they are quite handy. I would have wrote a state machine driven parser for html sanitizing though. Good topic though.

There’s also a multi-lingual regex builder here: http://regex.larsolavtorvik.com/

This is probably a very limited portion of your actual validation methods, but I hope you’re also planning on killing javascript and the likes.

Actually, going back to your previous posts about the horrors of BBCode or whatever that was, there are probably two good reasons for BBCode as opposed to HTML/other standard:

  • the brackets don’t require a shift modifier on the standard keyboard layout. Don’t think it makes a difference? Hey, you’re a programmer. [It’s a heck of a lot easier to use a bracket in the middle of typing than a less-than/greater-than thing.]
  • it’s also a whole lot easier to take something which may or may not be safe and transform it into something you know is than it is to try and clean the original so that it’s safe. [lit: transformer vs. converter]

and so, -it’s more accessible to the average user, and -it’s more reasonable for the average developer.

taste batter? :wink:

wow this is sad

maybe in a decade theyll be saying regex vs bnf is like goto vs functions

not that regular expressions are evil, per se, but that overuse of regular expressions is evil

Is it odd that I’ve always interpreted the expression this way?

The reason now you’ve got two problems comes up so often is because it so easily comes to mind and forces you to consider if regular expressions are really appropriate for the task at hand.

Why use regular expressions to extract an extension or file name from a path, when System.IO.Path does the same thing in a more readable manner?

Actually, you might as well replace regular expression with XML, or databases, or any number of other solutions people generally rush toward without thinking.

Great post - lots of value both here and from pointers to other links. Thank you!

@Randy Magruder

No trial version. No purchase. Period.

While there may not be a trial version per se, there is a three month unconditional money back guarantee (http://www.regexbuddy.com/guarantee.html). So you can in effect try it for three months.

I’ve been using RegexBuddy since version 1.0. It’s worth every penny.

Why sanitize the HTML? I just convert all the left angle brackets into their HTML entities to ‘reveal’ what the naughty person was trying to do.

Er, because sometimes you want to allow some HTML? You might, even, be anticipating it? Like from a richtext editor?

@Jeff Atwood:

I think you posted this rant before.

RegexBuddy 1.21 Demo Download: http://www.brothersoft.com/regexbuddy-29621.html

For kicks, try this:

s/regular expressions/macros/

How is this different from your writings on XML?

How many posts can you stretch out X is good for some things, just don’t use it for too many things. tip?

Not that it isn’t a good tip.

funny that you bring up regular expressions today because i just saw an insane one that nearly made me fall out of my chair. i’m in c# most of the time, and i don’t think i’m alone in saying that c# developers don’t throw around regexes too often. i was messing around with a javascript calendar picker and found this gem (and yes, it was all on one line):

System.Text.RegularExpressions.Regex DateRegEx = new System.Text.RegularExpressions.Regex(@^((0?[13578]|10|12)(-|/)(([1-9])|(0[1-9])|([12])([0-9]?)|(3[01]?))(-|/)((19)([2-9])(\d{1})|(20)([01])(\d{1})|([8901])(\d{1}))|(0?[2469]|11)(-|/)(([1-9])|(0[1-9])|([12])([0-9]?)|(3[0]?))(-|/)((19)([2-9])(\d{1})|(20)([01])(\d{1})|([8901])(\d{1})))$);

my basic reaction was to close that file and never look at it again. sure i could have deciphered it, added nice comments, etc. but i have other bugs to fix and the allocated hours for this project are dwindling…

The author of RegExBuddy also writes a great book about regular expressions available from LuLu publishing. Good price, nice product, great read, indispensible reference.

http://www.lulu.com/content/229786

If you want to use Regular expressions or do already, you really need this book.

Hi comment smart enough!
The apps isn’t free, does their any app open source or free for Regexp
the question for readers also!

Xepol, Jan is also working on a Regex book with another very talented regex pro, Steven Levithan.

http://www.regex-guru.info/2008/05/writing-offline/

I bet it’s gonna be REALLY good. Consider this preordered.

why

do

programmers

think

adding

whitespace

makes

things

easier

to

read?

It

doesn’t.

Stop

doing

it.

1 Like

I always wondered how people got by without regexs. Then I started asking that Steve Yegge question in interviews (the one about replacing all phone numbers in a huge site with one email address). Now I know. And I’m sadder for the knowledge.