A Visual Explanation of SQL Joins

I thought Ligaya Turmelle's post on SQL joins was a great primer for novice developers. Since SQL joins appear to be set-based, the use of Venn diagrams to explain them seems, at first blush, to be a natural fit. However, like the commenters to her post, I found that the Venn diagrams didn't quite match the SQL join syntax reality in my testing.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

Funny, I just explained this to a co-worker in the same manner earlier in the week. I guess it never dawned on me that others may have never thought about joins in terms of these diagrams. Good post, Jeff!

Hey Jeff, thanks for the great blog - I thoroughly enjoy reading every single one of your posts.

Even though I often am familiar with the concepts you talk of, I find the simple manner in which you break down the issues is always a great refresher.

Keep up the great work.

linked from join (SQL) en wikipedia to your blog entry … your post is much better (or simplistic) than the techtalk on that wikipedia entry :wink:

The post is much simpler than the Wikipedia entry because it omits all the non-trivial cases. All of these examples assume that each entry in table A will be matched by the join predicate to at most one entry in table B, and vice-versa.

I’m not even sophisticated enough to use databases anymore (my career as a developer has devolved), but when I was, 99% of the time I used inner joins and just thought of it as the data common to both tables.
Every now and then I wanted some data that didn’t fit that kind of join and I’d do a left outer join to get the data that was “left out”.

Speaking about Oracle (+) syntax and LEFT/RIGHT [OUTER] JOIN syntax there is a difference. And it is nicely described here
http://www.oreillynet.com/pub/a/network/2002/10/01/whatsinacondition.html

Speaking about ANSI syntax where everyone has to explicitly say to make CROSS JOIN to get Cartesian product at least in Oracle it is not true, using stupid (at least to my mind) NATURAL JOIN clause one can easily get CROSS JOIN and also JOIN on such columns he never thought it really is. I’ve blogged about it here
http://gplivna.blogspot.com/2007/10/natural-joins-are-evil-motto-if-you.html
Don’t know wehether it is just Oracle specific or as per ANSI SQL standard.

Speaking about INNNER JOINS as a Cartesian product and then eliminating all unnecessary rows, at least the real mechanics in Oracle is absolutely different, there are following possibilities for doing them:
NESTED LOOPS (OUTER)
HASH JOIN (OUTER)
MERGE JOIN (OUTER)
and for Cartesian product it uses MERGE JOIN CARTESIAN, and that’s when usually the real trouble (filling temp) starts :wink:
The real choice among Nested loops, hash join or merge join at least depends on data, statistics, available memory for a session, explicitly given available memory for hash joins or merge joins (btw there is possibility to give for one but not for other), system workload, hints, initialization parameters allowing ceratain join types (at least for hash joins) and probably something other :slight_smile:

Hi Jeff,

I think your illustrations make one BIG assumption: Table A and Table B contain unique rows. If either table contained duplicate matching records, then the whole Venn diagram will not hold (the cardinality of the sets will not add up). I am afraid this visual explanation will lead some to think that you can use SQL joins to do filter operations which shouldn’t be done this way.

Although it’s nice attempt to explain SQL joins, overall I think it is misleading.

-dt

One of the worst three months of my CS training was a class ostensibly on database theory which was really a boot-camp in SQL syntax, taught by a Johnny-one-note professor who thought SQL and RDMSs were the greatest invention of mankind.

This experience turned me off databases for 15 years until one day in the shower I realized that joins were analogous to set theory, in a rough way. (Yes they are, all you naysayers! “Give me everything that’s there AND there” or “Give me everything that’s there OR there” may not handle all the possible inputs, but it’s a good jumping off point for explaining “inner” and “outer.” And who came up with that stupid, arbitrary terminology anyway?)

I still think SQL is an awful language, though, and relational databases are an ugly hack mandated by hardware limitations trumping elegant design. OLAP “cubes” are so clearly a better solution—so intuitively clear and obvious—the natural generalization to higher dimensions of the flat-file database.

To me, a Cartesian Product would be best illustrated by both circles being blank.

Granted, all the SQL gurus will probably spit feathers at my outrageous suggestion, but for me, it’s a beautifully simple method of explaining simple joins and I’ll be using it in me technical documentation!

Thanks for this article, Jeff.

Jeff, you are the master of well crafted blog posts that sometimes don’t say much, but somehow gather scads of inane comments! This one and the previous post (on exercise) are great examples.

Now, not all of your posts are in this category, thank God! But I’m watching you…

Hey Now Jeff,
These diagrams make me think of when learning of classic boolean logic.
Coding Horror Fan,
Catto

Well, the database practitioner rejoice, and the set relationists cringe.

There is only one join, which is the cross product (each row of the first table concatenated to each row of the second table). The INNER JOIN example only discards some rows by the where clause, otherwise it is (or should be) the same as the CROSS JOIN. The outer join is a hack that isn’t very firmly rooted in relational algebra.

Nice Jeff, thank you so much for this post!

I always got confused with JOINS and how they work, that’s one of these things that are hard to be expained in a book or a manual.

But these diagrams make it pretty clear, they are great!

Don’t forget EXCEPT! Which was (and additionally still is) MINUS in Oracle. For years I think Oracle was the only RDBMS that implemented it and I had, in nearly 20 years, used it all of, oh, once. Until last week, when I used it the second time.

It’s the same as your fourth example: instead of

SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableB.id IS null

we can say

SELECT * FROM TableA
MINUS
SELECT * FROM TableB

(I used the old Oracle operator - it’s more expressive to my eyes)

Not every platform implements the operator yet: MySQL (which I just checked) doesn’t, for example. MS SQL Server 2005 does. SQLIte? Hang on … yup, that works in v3 at least.

Very strange. In my “Database Management” class at Arizona State University - a student asked about Join’s and my professor said “I never use them. Just stick with the WHERE syntax”.

This was a more theoretical course though. We spent FAR more time in relational algebra / calculus and algorithms for ensuring atomic transactions than we did in more practical stuff like database normalization.

Hmm… lots of people say the theoretical stuff is a waste of time. For me, well, I almost became a math major instead of CS, so I find the theoretical stuff more interesting.

If you don’t user any join-feature and just do “select * from TableA, TableB”, you have your cross-join.

Venn diagrams.

wow, so simple, so obvious; how come I haven’t seen this anywhere else?

Hi Jeff
that’s an nice diagram and a new way of presenting the sql join function.

Yet another case of the blind leading the blind (down the blind alley of ignorance and arrogance); the original, BTW. At least you made and attempt to fix it up. But reading all those “now I get it” comments, makes it clearer why Yahoos like GW Bush get away with bald faced lies. “People believe what they want to believe, when it makes no sense at all”/Mellencamp.

Now we’ll have another gaggle of folks running around fawning on Ambler and Fowler. Gad. The Horror, the Horror.