Demystifying SQL Queries
Let’s look at an average, everyday SQL query:
1 2 3 |
|
Not hard to draw parallels to other known concepts:
- FROM table: A set of elements
- WHERE conditions: A filter operation on these elements
- SELECT something: A presenter of an individual element
Tables are sets
The bare minimum way of interacting with a set, is to take a look at some or all of its elements. That is exactly what this query is doing. We’re selecting a subset of a set, based on a few conditions, and representing each element of this subset in some format. Recall that a subset of a set is also a set in itself. Which allows us to do something like:
1 2 3 4 5 6 7 |
|
We have a nested query, where instead of selecting from a table, we’re selecting from the result of selecting from a table. 1
Sets can have labels or aliases, so:
1 2 3 4 5 6 7 |
|
We seem to be doing some recursive looking stuff, and I find it beautiful.
Which also means, an assertion is in order: a select query operates upon a set, and returns a set. Naturally, a set can be a that of a single element as well.
1 2 3 |
|
Let’s substitute this set in our original query:
1 2 3 4 |
|
So a select query operates upon a set, and returns a set, and we know that sets can be unioned or intersected:
1 2 3 4 5 6 7 |
|
Notice that “something” and “the same something” are important. We can only union or intersection similar sets. Apples and oranges can’t be unioned in the relational algebra land.
Joins are Sets
Let’s talk about joins. Chances are, at some point in life you’ve written in INNER JOIN instead of an OUTER JOIN and got incorrect results. Or something along those lines. Joins can be very opaque, even to a regular practitioner.
A few things are important when considering joins:
- A join is a product of 2 sets. Always. Multi-table joins are just “first join these two”, “take the result” and “join the result with the next”.
- Joins are always performed on sets. So you can “join” any of the above mentioned sets, and you’re still good. Do note that language semantics dictate that you use aliases to disambiguate.
- NULL is always a part of each set. Implicitly so, for practicality.
This is best explained through an example:
1 2 |
|
Joining numbers
and letters
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Do you get the feeling that you are getting more than what you bargained for? Me too. Depending on context, we’ll want different subsets of this mega joined set. That’s exactly what different kinds of joins are for. These joins will determine what working set we’ll use.
- Inner join: Do not consider the entries which have NULL on either side.
- Left outer join: Do not consider entries which have NULL on the LEFT side.
- Right outer join: Do not consider entries which have NULL of the RIGHT side.
- Full outer join: Consider all entries.
For practical reasons, the entry where both sides are NULL is not considered. A slightly better example:
1 2 |
|
1 2 3 |
|
Our working set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
After applying conditions, and removing both sides NULL entry:
1 2 3 4 5 6 7 8 9 10 11 |
|
Rearranging a little, for better understanding:
1 2 3 4 5 6 7 8 9 10 11 |
|
Left and Right are determined from the join syntax. When A joins B, A is Left, and B is Right.
While attempting to write a join query, I encourage you to work out your join on paper first, with dummy data. Specially with multi-table joins.
Functions are Sets
That sounded nice, but it isn’t true. Functions aren’t sets, they operate on sets. Remember, a single value is also a set, so each function accepts a set as an argument, and returns a set.
1 2 3 |
|
My apologies for dropping a query on you without any domain context, but consider this slightly more complex function, which returns all the sibling branches of a given restaurant branch. Restaurant has many Restaurant Branches, and Restaurant Branch belongs to a Restaurant, to aid your understanding.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
We’re basically replacing a query with a function. And we learned that both SELECT queries and Functions operate on sets, and return sets.
Which brings me to another assertion: Sets and set operations tend to compose well. This is very important! Functional programming nerds practically live by this motto. Once you have small units that compose well, you can build complex units with relative simplicity.
Fundamentally, SQL is not so different. An important thing to keep in mind that this composing behaviour is mainly about the data and how the data is interpreted and processed. The query language itself leaves a lot to be desired when it comes to composing. A lot of things like aliases, joins can easily be taken care by a competent library. But, more or on this, and the advantages of using something like ARel in a later post.
Reading is destructuring, Writing is composing
Let’s collect all the set-like behaviour we’ve seen so far:
- We can interact with a set by looking at all or some of its elements
- A subset of a set is also a set in itself
- A set can have a label or an alias
- A set can be a that of a single element as well
- A set can be unioned or intersected with another set
Reading or writing complex queries becomes much easier, if we think of it as composing queries together, or decomposing a large query into smaller parts.
How to read complex queries
- Start with the innermost, or smallest “SELECT” clauses
- Replace them with an appropriately and descriptively named function, say
co_branches_of_given_branch
instead of justco_branches
.- If these inner queries, now functions, use a column / value from the outer queries, treat them as function arguments. (
co_branches
usedbranch_id
)
- If these inner queries, now functions, use a column / value from the outer queries, treat them as function arguments. (
- Keep applying this method until you reach the outermost query.
How to write complex queries
This really boils down to a top-down vs bottom-up approach. If you’re a top-down person: – Write the top-most query, assume all the lower level functions exist, with appropriate and descriptive names. – Recursively, apply the same strategy to each lower level function
Conversely, if you’re a bottom-up person: – Figure out the lowest level functions / queries you need, and write them – Build up your larger query by composing these functions.
-
Food for thought: How many nested select queries does your favourite relational database allow?↩