A Visual Explanation of SQL Joins

thanks a lot :slight_smile:

Well, as a (not-so-dumb) newbie to MySQL queries I was unclear about the JOIN statement and generally what is was all about. A quick Google brought me to this site.

Personally I think that it’s an excellent top-level explanation of the general principles behind the Join and great for newcomers such as myself. Sure there are exceptions here and alternatives there, and lots of live learning hurdles to overcome in given situations, but this IS a good intoductory explanation nonetheless.

Pay no attention to those that clearly know it all and are quick to sound off with nothing but those tiresome criticisms and negative sentiment, but do nothing to contribute support for newcomers.

Excellent work Jeff, and thank you for your time on this :o)

Hi,

Tanks for the great article. The visual explanation is the best way to learn Joins. :smiley:

Thanks for great article and this selfless help.

Great article!!

Thanks for the article…It was very useful.

I finally understand! Thanks :smiley: hehe

It took me a while to get my head around this but when I found this page everything made sense. Your layout example is great.
Thanks
Bob

Hey, I’ve been coding in MS SQL at my job for about a year. I still have trouble wrapping my head around the joins sometimes but I have this site bookmarked and I probably visit it at least once or twice a week.

Thanks for the great diagrams!

Hi,
couldn’t on express the outer join as an cartesian product of the discs A and B, which in this case would be a torus ??

The graphic for the cross join (or cartesian product) would be something like the old Games magazine logic puzzles (or similarly, the sheet in the Clue boardgame) with all options from set 1 on the horizontal, and all options from set two on the vertical. Each square in this grid is an item in the resultant table.

BuggyFunBunny, are you capable of writing even a single comment that doesn’t express your cantankerous political views? Or do you simply not realize that this is a technical blog and that nobody cares?

Andreas: I was under the impression that INNER JOINs use a hash match internally (in fact, I’m quite certain of this). CROSS JOINs obviously don’t. Therefore I don’t see how it’s reasonable to say that an IJ is just a special case of the CJ. I suppose in bread-and-butter set theory, where there’s no notion of a hash or an index, this could work, but this is the world of CS, where the clean academic solution is almost never the most practical one. The IJ you speak of is the naivest possible algorithm, only used when there are no indexes (and if you’re designing databases without any indexes, well… stop designing databases).

This post is a pretty decent tutorial for green developers having to work with SQL, and it’s something I might pass on if I felt that somebody else wasn’t “getting it”; I do agree, however, that it omits most of the non-trivial cases. If every relationship was one-to-one, you wouldn’t really need a database at all.

Nice post. It explains the results gathered from simple joins quite well. It doesn’t address how they work, but a nice visual on what you actually get.

Ricardo-

I believe Oracle also has the plus (+) operator as well for joins which I have seen on one or two projects. Th INNER JOIN, OUTER JOIN SYNTAX appeared in ANSI SQL 92 (1992). Most everyone uses that syntax, but most database sytems support the older way as well.

I’m happy that “monkey” and “spaghetti” made the list, but I really miss “more spaghetti”. Maybe that’s in another table.

Also fun are self-joins:

select * from TableA first inner join TableA second on
first.name = second.name where first.id != second.id

SQL Server’s always supported joins via the where clause (where TableA.Name = TableB.Name), but the left and right join operator syntax ("=" and "=") has been deprecated in favor of the explicit join syntax. The “=" and "=” syntax doesn’t specify join order and doesn’t clarify what’s in the where clause vs. a join operation.

The arguments about how inner joins are implemented are all correct, and all wrong at the same time.

They are at times done by a hash, sometimes a nested loop, sometimes a cartesian product and weeding out the ones that do not fit. You will need to look a the query plan to know which one your database will use for this particular query, and it can even change over time for the same query as the statistical distribution of data changes. The database query planner tries to pick the best for the current situation. This is why it is important to keep the statistics up to date.

Sometimes, it even does something stupid, that you know is wrong because you know how the data is, and you know a better way to do it. This is what query hints are for, to force it to do a particular join a particular way.

This was, though, just a basic overview. It was just to show the result sets, and help developers understand SQL joins if they have not used them before. Details on the more advanced topics, like what I mentioned above are available (just search the net) but those who this article was aimed at are not yet ready for that level of complexity.

Powerlord-

I’ve only seen the (+) in an Oracle (queries), but other RDMS system may support it (Although I cannot say for certain). I believe someone mentioned that (+) predated the ANSI syntax, so hopefully you should not encounter it. I beleive I witnessed it some stored procedures in a banking system.

At this point, any new code should use the ANSI 92 syntax as that seems to be the defacto standard at this point.

Thanks man, you saved my day

thanks, this is the best Joins explanation, the diagrams HELPS ALOT!
this solved my problem, thanks again

There are sixteen possible Venn states with two variables, you forgot eleven of them :wink: Seriously, I wonder how many people have independently made the connection between SQL join and Venn. I also worked this out a while back, it being a expansion of the UNIX sort and uniq combinations I used to.

Given two lists, and given that each list has only unique entries inside itself:

sort a b b | uniq -u (gives only what is in a)
sort a b | uniq -d (gives what is in both a and b)
sort a a b | uniq -u (gives what is only in b)

I found this useful for simple lists like hostnames or IPs