Excluding Matches With Regular Expressions

Here's an interesting regex problem:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2005/10/excluding-matches-with-regular-expressions.html

i faced a problem similar to this a few months ago where i had a template and i needed to look for anything in the string matching the template (ex. {something} look for something between the two braces). i searched high and low, to no avail, for a reg ex solution. it would seem as though a similar solution to the above problem could be used? maybe this could be done with a reg ex, i dunno (goes way past my knowledge of reg ex), in the end i had to write a string parsing algorithm.

i needed to look for anything in the string matching the template (ex. {something} look for something between the two braces).

Hmm, something between the two braces:

"{[^}]+}"

Just watch out for a) non-escaped {} characters and b) line breaks inside the braces.

I took advantage of this just last month. http://blog.eriklane.com/archive/2005/09/28/2105.aspx and I was pretty excited when I finally got it work. Woohoo!

Old comments, I know. But if you just split on “fox” you’ll get an array of strings split on the occurrences of “fox”. For example, “foo fox bar” would return an array of two strings, “foo " and " bar”.

ORANGE!

Your regex’s appear to use lookaheads/behinds in a different order from those on http://www.regular-expressions.info.

I thought lookaheads were positioned after the text whilst lookbehinds preceeded the text.

This is great. I needed something to exclude a tag. I will test it to see if it works for my application.

Very helpful when I needed to match anything but [Please Select]
^(?![Please Select]).*$

echo $STRING | sed 's/\(...\)..\(..\)/\1\2/g'

The \( and \) are marks. By marking characters 1-3 and 6-7, you output using \1 and \2 for mark #1 and mark #2 respectively.

What about a RegEx replace? You could do a replace on all of your matches with an empty string(or a “filtered” notification if you prefer) and achieve the same functionality.

Good!
This was a HORROR for me, but you nailed it!

Hi, what if we want to discard some of the matches?
In my case, I need to match the company identification number inside a supplier invoice, but, sometimes, we can also find our own company identification number and, depending on the supplier we can find them at different positions or even multiple times each of them. I’m able to extract the list of company identification numbers in the invoices, but I want to discard our specific one on it.
I can do it later by using an “If” in the code, but I want to know if there is a way to integrate it in the RegEx expression.

For example, my expression finds a list of identification numbers in invoices text

  • A123456, A111222, A111222, A123456, A123456, A111222, A111222
  • A333222, A123456, A333222, A333222
  • A333444, A123456, A33344, A123456, A123456, A123456
  • A123456, A555444

In all of them, I want to discard my company number that is A123456 and get only the other numbers each time.

Is there any way to integrate it in the RegEx expression?

I’ve found a way:

\b(?!A123456\b)(\d{8}[a-z]|[a-z]\d{8}|[a-z]\d{7}[a-z]|[a-z]{3}\d{6}|[a-z]-\d{2}[.]\d{3}[.]\d{3})\b

:partying_face: