A Visual Explanation of SQL Joins

If only someone had explained SQL to me like this when I started… great blog.

very cool visualisation, please do more of that

the problem with venn diagrams is that a rdbms table is not the same thing as a set.

Great visual explanation. I’ve added that to my list of reference bookmarks for when I (often) forget the nature of each SQL join type.

You’re right Jeff.

This is a useful explanation, as far as getting people to understand joins in terms of something they already understand (rudimentary set theory). However, it should be made clear that this is not what is meant by the statement that “the relational database model is based on set theory”. For a long time, I thought I understood that statement because I had this Venn-diagram understanding of joins.

Joins do not map directly to the basic additive operations of set theory (union, intersect, difference), which create new sets by way of simply including or excluding elements of other sets. There are ways to get those in SQL, but it involves chaining UNION and MINUS statements.

Joins, rather, are mapping operations, which create new sets by taking an element from each set, according to a set of rules, and lumping them together in a new element for the new set that is actually itself a small set containing the original two elements from the other sets.

If A is an additive set operation (representable by a Venn diagram), then the result of A on sets X and Y is a set of members of X and Y.

A(X, Y) = {x1, y1, x2, x3, y4, y5, …}

In database terms, each element of the resulting set is simply a row from either X or Y.

In real-world terms, it’s like sorting jelly beans. Throw them all into a bucket, through a sieve. The sieve eliminates all but the appropriate shaped ones, but what gets through is still what you started with: an assortment of jellybeans.

If B is a mapping set operation (representable by a line-drawing matching question), then the result of B on sets X and Y is a set of sets each containing a member from X and a member of Y.

B(X, Y) = {{x1,y1}, {x2,y2}, {x3,y3}, …}

In database terms, each element of the resulting set is a new type of row that is a lumping together of a row from X and a row from Y.

In real-world terms, an example would be taking a bucket of oranges and a bucket of apples, and pulling out pairs of one of each that are the same size, putting the pairs together in little bags, and then putting those bags into a third bucket. You don’t get a bucket of apples and oranges. You get a bucket of pairs of an apple and an orange.

Looking at it this way, it should be reasonably easy to see that there is a fundamental difference between joins and the simple union/intersect/difference set operations.

Except that venn diagrams explain set logic, and SQL Joins have very little to do with set logic.

Scary how many people are agreeing with this.

My thoughts exactly!

Jeff, this is great! How about visually explaining why you have SELECT all the columns you are GROUP-ing by? I always have a hard time explaining that one to people!

I’m afraid this concept is misleading on the reasons mentioned by Sven Groot.

The result of a join is not those items in table A and those items in table B, it is a “joined” combination of the two. The diagrams you have drawn are set diagrams and are created in SQL using the set operators UNION and INTERSECT.

Joins are a completely different concept which I believe are best explained using just the table outputs you have above without adding the incorrect venn diagrams.

JF, the explanation behind SELECT and GROUP BY can be described simply.

SELECT COUNT(*) FROM People
-- This will return a count of 100 people

SELECT EyeColour, COUNT(*) FROM People GROUP BY EyeColour
-- This will return Blue 20, Green 50, Brown 30

SELECT COUNT(*) FROM People GROUP BY EyeColour
-- This will return 20, 50, 30 whilst this contains the counts that you are looking for, they are useless as there is no relation to each count and what the represent.

Assuming Pirate and Ninja are identical records in both tables, the very first thing that came to mind as what result I would want when I join Tables A B was

Pirate
Monkey
Ninja
Spaghetti
Rutabaga
Darth Vader

This seems to be the intuitive meaning of join. Venn’s don’t work.

This is a very likable article,I personally really like the way you’ve done this…

It never ceases to amaze me how many folks become “experts” on a technical topic once somebody takes the time to write about one.

I give props to the author of the article presented here. I needed to understand this information, and it was put to me visually in a way I could parse quite easily.

As for the folks dissecting this article, nit-picking even the premise, I ask that you please write a better one, and post the link instead of complaining. I will then drop by and pick it apart out of professional discourtesy.

Thanks.

The Cartesian product of 2 sets is the set of all combinations of ordered pairs which can be produced from the elements of both sets. So the Venn representation would look like two piles of poker chips next to each other, each pile numbering the number of data points in the other bottom chip.

Thanks for this great Tutorial :slight_smile:

Nice way to explain things in joins. By seeing the example nobody cannot forget joins.

Keep it up. Hope you will publish many more article which will be useful to all.

Thanks,
Satheesh.

thanks for this, I am in a class for crystal reports and I showed many and it helped them

nice quick lesson on joins. This is exactly what i wanted.

Great explanation of the basics of inner and outer joins.
I’ll be referring it often and passing it on.
Also very brave to raise anything here as not 100% perfect equals rubbish

Martin

Very simple, I’am always confused up with sql operations, but I understand better with simple diagrams like yours.
Thks again for this nice job.

Thank god for Venn diagrams! Good job mate.

Wow! Terrific tutorial. I’ve never understood JOINS beyond the concept of them. Thank you for demystifying!

Clay