A Visual Explanation of SQL Joins

Thanks to this I’m now skipping merrily down the path of BOM management and stock control, rather than flicking furiously through Teach-yourself books, thanks

Excellent post.

Everyone who has followed (and passed) a computer science beginner course should be able to understand this.

It’s important to note that when joining two tables on non-indexed columns most databases perform the equivalent of a cross join and then weed out the rows that don’t match the join/where criteria. While this doesn’t really impact the query results, it has a big impact on query performance.

In MySql you can run: explain my query to determine if the query is using indexes and review performance of all your queries.

1 Like

Perfect timing. I just had a new project thrown on my desk that will require working with SQL - something I know little about and have avoided. Terms that have glazed my eyes in weeks past suddenly make sense.

dt is 100% right. You really need a disclaimer that indicates that your set diagrams only hold when there is at best a one-to-one correspondence between join criteria values in the tables.

The second you have two or three pirates in one of the tables (an extremely common real-world scenario, probably much more common than the simplified case you show here) your Venn diagrams break down.

What a great way to explain JOINs, I’m certain there are thousands of students that would benefit from this… a picture is truly worth a thousand words ( especially in cases like this ).

I always use left outer joins and never really liked the right outer join. Is there anytime they are useful? It always seems bass ackwards to use a right outer join.

Of course other developers use visual query designers which use right outer joins…

Is it just me or Oracle’s syntax is much simpler than that? Like:

select * from TableA, TableB
where TableA.Name = TableB.Name

I’m familiar with both, both I always liked that one better.

Jeff,

This is great for those SQL types that didn’t take a set theory, relational calculus or even discrete math in college.

I’d love to see you visualize a cross product :slight_smile:

Ricardo, that syntax is pre-ANSI. Most databases still support it but the advantages of the INNER JOIN, etc. syntax are: it makes the queries more portable across databases; it allows separation of the join criteria from the filter criteria, which improves readability; and allows full outer joins without syntactic hacks.

Another huge benefit is that if you want to do a CROSS (Cartesian) JOIN, and usually you don’t, you have to call it out specifically. In the old syntax, if you screwed up the WHERE clause you could get wildly huge resultsets due to inadvertent Cartesian joins.

Joe Beam,

The only use I have found for RIGHT OUTER is when i’m too lazy to reorder my query. :wink: Other then that it makes more sense to use only one type.

I use LEFT JOIN too, as do my company.

I agree with other responses here Jeff, while it’s a really nice simple way to represent joins, it doesn’t handle many to many or one to many very well and let’s not talk about malformed joins.

It’s a good primer, but should perhaps contain a warning.

The commenters pointing out that the diagrams break down in case of multiple and or duplicate results, are absolutely right. I was actually thinking of joins along the primary key, which tends to be unique by definition, although the examples are not expressed that way.

Like the cartesian or cross product, anything that results in more rows than you originally started with does absolutely breaks the whole venn diagram concept. So keep that in mind.

Except that venn diagrams explain set logic, and SQL Joins have very little to do with set logic.

Scary how many people are agreeing with this.

Actually, I’ve always been a bit shaky on joins myself. I always have to look them up any time I need to using a join that isn’t a standard filtered cross product. But seeing this visual guide has really helped me to grasp it much better. I just went back and read the Wikipedia entry for joins and it makes complete sense to me now. Before seeing this visual representation I might have just glazed over when reading the Wikipedia article.

So even if they aren’t exactly right having a visual representation has really helped me to understand what is going on.

Thanks!

A nice trick is you can write arbitrary filters in the ON clause of the join. I use this when writing truly awful joins or when I need the left join with an already filtered table.

From one Jeff to another, this is a brilliant essay on a simple topic - simple but vexing to newbies. I am a DB2 DBA, and explaining outer joins to the newbie developers I encounter constantly will be much easier with materials like this. Thanks, keep it coming.

I have never seen such a good illustration of SQL joins. Good work.

I must agree that while the diagrams are a fairly good illustrations of what happens with a join, they’re not Venn diagrams in the strict sense because that’s just not what’s happening. What the first diagram says to me is that you get all records that are both in set A and set B (aka intersection). That’s just not true. You get a set of new records, that are neither in set A nor in set B, but they contain a combination of the attributes of both sets.

Aaron G: joins are conceptually a special case of the cartesian product. How the database server actually computes them is another matter entirely.

Conceptually, in relational algebra, an inner join is a cross product followed by a selection on the join condition. That’s a terribly inefficient way to compute it, but conceptually, that’s what it is.

The morons running my database fundamentals class at my university thought that teaching the Cartesian product as a method of joining, then filtering the results, was a valid alternative to the insanely simple inner join. So many poor students are going to have a hard time in the industry as a result.

Jon Raynor:
Out of curiosity, but what databases other than Oracle support + for joins?

Anyway, I prefer the “TableA a INNER JOIN TableB b ON a.something = b.somethingelse” syntax over “TableA a, TableB b WHERE a.something = b.somethingelse”

(Most DBs also support “USING(something)” instead of “ON a.something = b.something” if the columns have the same name)

Many DBs will throw an error if I accidently omit the ON statement instead of giving me a cross join that I didn’t want. I imagine that the query parser would also parse it faster, as I specify up front which type of join I’m doing; the query parser doesn’t have to figure it out.