Regular Expressions: Now You Have Two Problems

Sean,howareyousosurewhitespacedoesn’tmakethingseasiertoread?
IknowIgetsomebenefitoutoftheoccasionalspace.
Obviouslycopiousamountsofcarriagereturnswon’tdoanybodyanygood,
butyoudon’talwayshavetotakethingstoexcesss.

Sean:

Becau seBadWhite Spaci
Can Tota
l y Messyou up

I strongly disagree with Sean. White space and comments are huge aids in reading code.

Regarding Regex Buddy.

No trial version. No purchase. Period.

Nice post, Jeff. I’d have to say the biggest problem I see is with programmers who don’t think it’s worthwhile or fun to learn regex. Almost all the ones I’ve worked with avoid it like the plague. But then once they start using the 15th language in a row that supports regex they start realizing that it’s probably not such a bad idea.

I can’t tell you how many stop-gap translation applications I haven’t had to build just because of regex support in Java, XML, PHP…

I never knew about the IgnoreWhiteSpace option. That makes life much better. Thanks!

ORANGE!

I’m working on a regex tool built in WPF. Its primarily a learning thing for me, but I’m looking to make it a slimmed down, bloat free regex go-to tool for those I-just-need-to-test-this-for-a-sec regex moments.

http://statestreetgang.net/post/2008/05/Regex-and-WPF.aspx

I’ve already made serveral improvements to that code, and I think I’ll be entering it in this upcoming community coding contest.

http://pietschsoft.com/post/2008/06/Community-Coding-Contest-to-start-July-1st-Chance-to-Win-MSDN-Premium-Subscription-with-VS08-Team-Suite-plus-more.aspx

That’ll give me some motivation to finish it.

(not ^that^ Sean)

Other Sean: Adding whitespace makes things easier to read. If it doesn’t work for you, then you must be some mental-parsing genius. Great. You’re better than the rest of us. Go on with your life now.

Jeff: I agree with your RegEx assessment.

My favourite I’ve been using for years is Regex Coach - http://www.weitz.de/regex-coach/
It’s free, with donations encouraged.

Works on both Linux Windows.

Damn you Jeff! Since I read your post I’ve been thinking a lot about eating a hamburger with lots of Tabasco. I was subliminally attacked by your free publicity… Great blog, read it every day.

Awesome, Regular expression for testing prime numbers :
http://mail.pm.org/pipermail/athens-pm/2003-January/000033.html

print Prime if (1 x shift) !~ /^1?$|^(11+?)\1+$/

This is by Abigail, who is something of a legend in the Perl community.

I’ll admit I have made a regex or two with poor whitespace, however it is nice once you get to the point where you can read regex like your native language. That said, recursive regular expressions can still be kind of confusing.

Don’t get me wrong. I understood what you meant.

But I find it kinda fun that you love regexps and don’t like xml. :slight_smile:

Ditto on strongly disagreeing with the first comment. White space turns ordinary obfuscated perl into something a bit more, uh, pythonic.

Why are you writing your own html sanitizer? It has already been written enough times. Are you also writing your own webserver and C library? And why are you using regular expressions to do it? Do you want your service to be vulnerable to html/js injection?

The abuse of regexes as parsers isn’t unknown to me. Actually I’ve created a function that parses a ?:-like language:

public static string ParseTemplateString(string str, Funcstring, object getVars)
{
// Regex
System.Text.RegularExpressions.Regex rx = new System.Text.RegularExpressions.Regex(
string.Format(@
(?mod?!?)? # Match the type of the expression
(?v1$[A-Za-z_0-9]+) # Match the variable or the complex condition
(?(mod)
(
{0} # Match first opeing delimiter
(?inner
(?
{0} (?LEVEL) # On opening delimiter push level
|
{1} (?-LEVEL) # On closing delimiter pop level
|
(?! {0} | {1} ) . # Match any char unless the opening
)+ # or closing delimiters are in the lookahead string
(?(LEVEL)(?!)) # If level exists then fail
)
{1} # Match last closing delimiter
){{1,2}} # Match one or two subexpressions
|
:(?v2$[A-Za-z_0-9]+) # Match the simple condition
)?
, \{, \}),
System.Text.RegularExpressions.RegexOptions.Compiled
| System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace);
// $Var, $Var:$Condition, ?(!)$Condition{…}{…}
System.Text.RegularExpressions.MatchCollection mc = rx.Matches(str);
foreach (System.Text.RegularExpressions.Match m in mc)
{
if (m.Groups[mod].Length 0)
{
bool cond = Convert.ToBoolean(getVars(m.Groups[v1].Value.Substring(1)));
if (m.Groups[mod].Value == ?!)
cond = !cond;
if (!cond)
{
if (m.Groups[inner].Captures.Count == 2)
str = str.Replace(m.Value, ParseTemplateString(
m.Groups[inner].Captures[1].Value, getVars));
else
str = str.Replace(m.Value, );
}
else
str = str.Replace(m.Value, ParseTemplateString(
m.Groups[inner].Captures[0].Value, getVars));
}
else if (m.Groups[v2].Length 0)
{
bool cond = Convert.ToBoolean(getVars(m.Groups[v2].Value.Substring(1)));
if (!cond)
str = str.Replace(m.Value, );
else
{
object val = getVars(m.Groups[v1].Value.Substring(1));
str = str.Replace(m.Value, val.ToString());
}
}
else
{
str = str.Replace(m.Value, getVars(m.Groups[v1].Value.Substring(1)).ToString());
}
}
return str;
}

For some reason I believe there must be a sound correlation between liking regular expressions and disliking XML. I suspect people either do both or neither :slight_smile:

Hey Jeff, here’s a regular expression you might enjoy:

s/who I admire/whom I admire/

:slight_smile:

Dean said I strongly disagree with Sean. White space and comments are huge aids in reading code.

I strongly disagree with that. Ar least half of it.
Comments are evil. A necessary evil, sometimes, but nonetheless they are evil. We should aim for ‘self-documenting’ code. 99% of the times when the code is not self-documenting, it’s because the developer didn’t do as good a job as (s)he should have (maybe because they were not given the opportunity, but we’re not debating causes).

That being said, I’m not an expert at regexes and that is why the comments in Jeff’s original post would help me understand his regex. But that’s because of my shortcoming. If we take that approach, then we should have comment on each line of code explaining what it does, just in case someone that doesn’t know the programming language we picked happens to read the source… impractical.

F.O.R.

I agree with the whitespaces :wink:

http://www.duivesteyn.com.au