Introduction
In relational databases, selecting rows based on certain criteria is a fundamental operation. One common requirement is to retrieve only those rows where a specific column has its maximum value within each group of related records. This tutorial will guide you through different methods to achieve this using SQL, focusing on MySQL as the database system.
Understanding the Problem
Consider a table documents
with columns: id
, rev
, and content
. Each id
can have multiple revisions (rev
). The task is to select one row per id
that has the maximum rev
.
For example, given the following data:
| id | rev | content |
|—-|—–|———|
| 1 | 1 | … |
| 2 | 1 | … |
| 1 | 2 | … |
| 1 | 3 | … |
The desired output is:
| id | rev | content |
|—-|—–|———|
| 1 | 3 | … |
| 2 | 1 | … |
Methods to Solve the Problem
Method 1: Using a Subquery with JOIN
This method involves creating a subquery to find the maximum rev
for each id
, and then joining this result back to the original table.
SELECT a.id, a.rev, a.content
FROM documents a
INNER JOIN (
SELECT id, MAX(rev) AS max_rev
FROM documents
GROUP BY id
) b ON a.id = b.id AND a.rev = b.max_rev;
Explanation:
- The subquery
(SELECT id, MAX(rev) AS max_rev FROM documents GROUP BY id)
calculates the maximumrev
for eachid
. - The main query joins this result back to the original table on both
id
andmax_rev
.
Method 2: Using a Self LEFT JOIN
This approach uses a self-join with a LEFT JOIN
to filter out non-maximum rows.
SELECT a.*
FROM documents a
LEFT OUTER JOIN documents b
ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;
Explanation:
- The table is joined with itself where the left side has a smaller
rev
than the right. - Rows that have no greater
rev
in the join result (i.e.,b.id IS NULL
) are those with the maximumrev
.
Method 3: Using Window Functions
Window functions provide a powerful way to perform calculations across sets of rows related to the current row.
SELECT id, rev, content
FROM (
SELECT id, rev, content,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) AS rn
FROM documents
) subquery
WHERE rn = 1;
Explanation:
ROW_NUMBER()
assigns a unique number to each row within the partition ofid
, ordered byrev
descending.- The outer query filters rows where
rn
equals 1, selecting only the maximumrev
.
Method 4: Using Correlated Subquery
A correlated subquery can be used to find the maximum rev
for each id
directly.
SELECT yt.id, yt.rev, yt.content
FROM documents yt
WHERE rev = (
SELECT MAX(st.rev)
FROM documents st
WHERE yt.id = st.id
);
Explanation:
- For each row in the outer query, the subquery finds the maximum
rev
for that specificid
. - The condition
yt.rev = (subquery)
ensures only rows with the maximumrev
are selected.
Conclusion
Each method has its advantages and can be chosen based on readability, performance, or database compatibility. Understanding these techniques allows you to handle similar problems effectively in SQL, making your queries more efficient and maintainable.