Joins and Subqueries in SQL

This tutorial is adapted from the Web-Age course Introduction to SQL.

A SQL Join statement is used to combine data or rows from two or more tables based on a common field between them. A subquery is a query that is nested inside a SELECTINSERTUPDATE, or DELETE statement, or inside another subquery. Joins and subqueries are both used to combine data from different tables into a single result.

1.1 Primary Keys and Foreign Keys

A primary key is a column in a table that’s used to uniquely identify a row in that table.

A foreign key is used to form a relationship between two tables. For example, let’s say you have a one-to-many relationship between customers in a CUSTOMER table and orders in an ORDER table. To relate the two tables, you would define an ORDER_ID column in the CUSTOMER table that corresponds with the ORDER_ID column in the ORDER table.

Note: The name of the foreign key column does not need to match the name of the primary key column.

The values defined in a foreign key column MUST match the values defined in the primary key column. This is referred to as referential integrity and is enforced by the RDBMS. If the primary key for a table is a composite key, the foreign key must also be a composite key.

1.2 Inner Joins

Inner joins are used to combine related information from multiple tables. An inner join retrieves matching rows between two tables. Matches are normally made based on the primary key/foreign key relationships that exist. If there’s a match, a row is included in the results. Otherwise, it’s not.

Syntax:

SELECT <columns>

FROM <table1>

[INNER] JOIN <table2>

ON <table1>.<columnA> = <table2>.<columnB>

Technically a foreign key does not need to exist in the related table in order to do an inner join. However, to guarantee referential integrity and for performance reasons, it’s recommended to have one.

If the primary key and foreign key are composite keys, then you would AND together any additional columns when specifying the join. For example, if you had a two-column composite key, then the syntax would be:

SELECT <columns>

FROM <table1>

[INNER] JOIN <table2>

ON <table1>.<columnA> = <table2>.<columnB>

AND <table1>.<columnC> = <table2>.<columnD>

A database schema represents the organization and structure of a database. It contains the tables and other database objects (e.g,. views, indexes, stored procedures). A database can have more than one schema, which is the case here. The AdventureWorks database actually has 6 schemas (dbo, HumanResources, Person, Production, Purchasing, and Sales). In SQL Server, dbo is the default schema. On the other hand, Northwind only has the dbo schema.

In the example above, the HumanResources.Employee identifier signifies that HumanResources is the schema name and Employee is the table name. The Person. Person identifier signifies that Person is the schema name and Person is the table name.

1.3 Inner Join Examples

Joining together 2 tables:

SELECT 
	Person.FirstName, Person.LastName, 
	Employee.BirthDate, Employee.HireDate
FROM HumanResources.Employee
JOIN Person.Person
ON Employee.BusinessEntityID = Person.BusinessEntityID

Joining together N tables:

SELECT 
	Person.FirstName, Person.LastName, 	Employee.BirthDate, Employee.HireDate, 	EmailAddress.EmailAddress
FROM HumanResources.Employee
JOIN Person.Person
ON Employee.BusinessEntityID = Person.BusinessEntityID
JOIN Person.EmailAddress
ON Person.BusinessEntityID = EmailAddress.BusinessEntityID

1.4 Prefixing Columns with Table Names

It’s normally good practice to prefix the column names with the table names so it’s clear which table the columns are coming from. However, if there’s no ambiguity, meaning the same columns don’t exist in both tables, you can omit them.

Example:

SELECT 
	FirstName, LastName, 
	BirthDate, HireDate
FROM HumanResources.Employee
JOIN Person.Person
ON Employee.BusinessEntityID = Person.BusinessEntityID

1.5 Using Table Aliases

To make column names shorter and more readable when joining together two tables, we can use table aliases. Table aliases are similar to column aliases. Instead of abbreviating a column name, we abbreviate a table name. Often times table aliases are one or two characters in length. Table aliases can be used even if you’re not joining together tables.

Syntax:

SELECT <columns>

FROM <table1> [AS] <alias1>,

<table2> [AS] <alias2>

JOIN <table2>

ON <alias1>.<columnA> = <alias2>.<columnB>

1.6 Using Table Aliases

Example:

SELECT 
	p.FirstName, p.LastName, 
	e.BirthDate, e.HireDate
FROM HumanResources.Employee e
JOIN Person.Person p 
ON e.BusinessEntityID = p.BusinessEntityID 

1.7 Alternate Inner Join Syntax

An alternate syntax exists for doing an inner join, which uses the WHERE clause for specifying the join criteria. You may see this style of join used in legacy SQL code.

Syntax:

SELECT <columns>

FROM <table1>, <table2>

WHERE <table1>.<columnA> = <table2>.<columnB>

Example:

SELECT 
	Person.FirstName, Person.LastName, 
	Employee.BirthDate, Employee.HireDate
FROM 
	HumanResources.Employee, 
	Person.Person
WHERE Employee.BusinessEntityID = Person.BusinessEntityID

To specify outer joins, you need to use a vendor-specific syntax. For example, in Oracle, you need to use the (+) operator and in SQL Server you need to use the *= and =* operators.

1.8 Outer Joins

What happens if you’re doing an inner join and one of the records in a table doesn’t have a matching record in the other table? Then no row appears in the result set. However, you may still wish to see the row in your report. For example, you may have a customer, but that customer hasn’t placed any orders yet. You still want to list the customer. That’s where outer joins come to the rescue. There are several types of outer joins that we’ll discuss next.

1.9 Left Outer Joins

A left outer join returns all records from the table on the left and the records that match from the table on the right. If there is no matching record, then NULL is returned for any columns selected from the table on the right. It is the equivalent of an inner join plus the unmatched rows from the table on the left.

Syntax:

SELECT <columns>

FROM <table1>

LEFT [OUTER] JOIN <table2>

ON <table1>.<columnA> = <table2>.<columnB>

1.10 Left Outer Join Examples

Retrieve back all employee ids (i.e., business entity ids) and any matching purchase order ids.

SELECT 
	e.BusinessEntityID, 
	p.PurchaseOrderID 
FROM HumanResources.Employee e
LEFT JOIN Purchasing.PurchaseOrderHeader p
ON e.BusinessEntityID = p.EmployeeID 

Retrieve back all employee ids (i.e., business entity ids) and any matching job candidate ids and resumes

SELECT e.BusinessEntityID, j.JobCandidateID, j.Resume 
FROM HumanResources.Employee e
LEFT JOIN HumanResources.JobCandidate j
ON e.BusinessEntityID = j.BusinessEntityID 
ORDER BY j.JobCandidateID DESC

1.11 Right Outer Joins

A right outer join is the opposite of a left outer join. A right outer join returns all records from the table on the right and the records that match from the table on the left. If there is no matching record, then NULL is returned for any columns selected from the table on the left. It is the equivalent of an inner join plus the unmatched rows from the table on the right.

Syntax:

SELECT <columns>

FROM <table1>

RIGHT [OUTER] JOIN <table2>

ON <table1>.<columnA> = <table2>.<columnB>

Example:

Retrieve back all matching job candidate ids and resumes and any matching employee ids (i.e., business entity ids)

SELECT e.BusinessEntityID, j.JobCandidateID, j.Resume 
FROM HumanResources.Employee e
RIGHT JOIN HumanResources.JobCandidate j
ON e.BusinessEntityID = j.BusinessEntityID 
ORDER BY e.BusinessEntityID DESC

1.12 Full Outer Joins

A full outer join returns all matching records between the table on the left and the table on the right, as well as all non-matching records on both sides. It is the equivalent of an inner join plus the unmatched rows from the table on the left and the unmatched rows from the table on the right.

Syntax:

SELECT <columns>

FROM <table1>

FULL [OUTER] JOIN <table2>

ON <table1>.<columnA> = <table2>.<columnB>

Example:

Retrieve back all employee ids (i.e., business entity ids) and all job candidate ids and resumes. Match together any records possible.

SELECT e.BusinessEntityID, j.JobCandidateID, j.Resume 
FROM HumanResources.Employee e
FULL JOIN HumanResources.JobCandidate j
ON e.BusinessEntityID = j.BusinessEntityID 
ORDER BY e.BusinessEntityID DESC

1.13 Self Joins

A self-join is a join where the table is joined with itself. It is typically used where there is a hierarchical relationship between the entities (e.g., employee-manager), or you wish to compare rows within the same table. It uses either the inner join or left outer join syntax. Table aliases are used to assign different names to the same table within the query. 

Syntax:

SELECT <columns>

FROM <table> AS <alias1>

[INNER | LEFT] JOIN <table> AS <alias2>

ON <alias1>.<columnA> = <alias2>.<columnB>

Example:

To match addresses from the same city, use the following query:

SELECT a1.AddressID, a2.AddressID, a1.City 
FROM Person.Address a1
JOIN Person.Address a2 
ON a1.AddressID > a2.AddressID 
WHERE a1.City = a2.City 

1.14 What is a Subquery?

A subquery is a nested query (inner query) that’s used to filter the results of the outer query. Subqueries can be used as an alternative to joins. A subquery is typically nested inside the WHERE clause.

Syntax:

SELECT <columns>

FROM <table>

WHERE <column> <operator>

(SELECT <columns>

FROM <table>

[WHERE <condition>])

Subqueries must always be enclosed within parentheses. 

The operator can correspond with one of the following values:

IN, =, <>, <, >, >=, <=

If the subquery returns more than one result, then you can only use the IN operator

The table that’s specified in the subquery is typically different than the one in the outer query, but it can be the same.

1.15 Subquery Examples

SELECT Name, ProductNumber
FROM Production.Product 
WHERE ProductID IN 
	(SELECT ProductID 
	FROM Purchasing.PurchaseOrderDetail
	WHERE OrderQty > 5)

SELECT FirstName, LastName
FROM Person.Person p 
WHERE BusinessEntityID =
	(SELECT BusinessEntityID 
	FROM HumanResources.Employee
	WHERE NationalIDNumber = 295847284)

1.16 Regular Subquery vs. Correlated Subquery

The type of subquery we have studied so far, which we’ll refer to as a regular subquery, is independent of the outer query and gets executed once. Its results are used by the outer query. On the other hand, a correlated subquery depends on the outer query and gets executed once for each row returned by the outer query. It’s also referred to as a repeating subqueryBecause correlated subqueries get executed multiple times, they can be slow.

1.17 Correlated Subquery Example

Example:

To retrieve the salespeople whose commission percentage is 1%, we can use the following correlated subquery. The outer query retrieves each employee. The inner query evaluates each row to see if their commission percentage is 1%.

SELECT e.BusinessEntityID, p.FirstName, p.LastName
FROM Person.Person p 
JOIN HumanResources.Employee e
ON e.BusinessEntityID = p.BusinessEntityID 
WHERE 0.01 IN
    (SELECT s.CommissionPct 
    FROM Sales.SalesPerson s
    WHERE e.BusinessEntityID = s.BusinessEntityID)

1.18 Summary

In this tutorial, we explored how to:

  • Use inner, outer, and full joins to query more than one table
  • Use self joins to join a table to itself
  • Use subqueries to filter the results of an outer query

Leave a Reply

Your email address will not be published. Required fields are marked *