Excluding Matches With Regular Expressions

codinghorror · October 23, 2005, 12:00am

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2005/10/excluding-matches-with-regular-expressions.html

matt117 · October 25, 2005, 12:00am

i faced a problem similar to this a few months ago where i had a template and i needed to look for anything in the string matching the template (ex. {something} look for something between the two braces). i searched high and low, to no avail, for a reg ex solution. it would seem as though a similar solution to the above problem could be used? maybe this could be done with a reg ex, i dunno (goes way past my knowledge of reg ex), in the end i had to write a string parsing algorithm.

codinghorror · October 25, 2005, 12:00am

i needed to look for anything in the string matching the template (ex. {something} look for something between the two braces).

Hmm, something between the two braces:

"{[^}]+}"

Just watch out for a) non-escaped {} characters and b) line breaks inside the braces.

Erik_Lane · October 26, 2005, 12:00am

I took advantage of this just last month. http://blog.eriklane.com/archive/2005/09/28/2105.aspx and I was pretty excited when I finally got it work. Woohoo!

WillS · February 19, 2007, 12:00am

Old comments, I know. But if you just split on “fox” you’ll get an array of strings split on the occurrences of “fox”. For example, “foo fox bar” would return an array of two strings, “foo " and " bar”.

ORANGE!

James · November 7, 2007, 12:00am

Your regex’s appear to use lookaheads/behinds in a different order from those on http://www.regular-expressions.info.

I thought lookaheads were positioned after the text whilst lookbehinds preceeded the text.

Lorre · August 2, 2008, 12:00am

This is great. I needed something to exclude a tag. I will test it to see if it works for my application.

David · September 4, 2008, 12:00am

Very helpful when I needed to match anything but [Please Select]
^(?![Please Select]).*$

Doc_Soliday · March 31, 2009, 12:00am

echo $STRING | sed 's/$...$..$..$/\1\2/g'

The $ and $ are marks. By marking characters 1-3 and 6-7, you output using \1 and \2 for mark #1 and mark #2 respectively.

MartyT · February 6, 2010, 12:00am

What about a RegEx replace? You could do a replace on all of your matches with an empty string(or a “filtered” notification if you prefer) and achieve the same functionality.

radityoardi · January 27, 2015, 9:10am

Good!
This was a HORROR for me, but you nailed it!

Airun_12 · April 19, 2024, 6:40am

Hi, what if we want to discard some of the matches?
In my case, I need to match the company identification number inside a supplier invoice, but, sometimes, we can also find our own company identification number and, depending on the supplier we can find them at different positions or even multiple times each of them. I’m able to extract the list of company identification numbers in the invoices, but I want to discard our specific one on it.
I can do it later by using an “If” in the code, but I want to know if there is a way to integrate it in the RegEx expression.

For example, my expression finds a list of identification numbers in invoices text

A123456, A111222, A111222, A123456, A123456, A111222, A111222
A333222, A123456, A333222, A333222
A333444, A123456, A33344, A123456, A123456, A123456
A123456, A555444

In all of them, I want to discard my company number that is A123456 and get only the other numbers each time.

Is there any way to integrate it in the RegEx expression?

Airun_12 · April 19, 2024, 9:29am

I’ve found a way:

\b(?!A123456\b)(\d{8}[a-z]|[a-z]\d{8}|[a-z]\d{7}[a-z]|[a-z]{3}\d{6}|[a-z]-\d{2}[.]\d{3}[.]\d{3})\b