Selecting Rows with Maximum Values in a Column Using SQL

Introduction

In relational databases, selecting rows based on certain criteria is a fundamental operation. One common requirement is to retrieve only those rows where a specific column has its maximum value within each group of related records. This tutorial will guide you through different methods to achieve this using SQL, focusing on MySQL as the database system.

Understanding the Problem

Consider a table documents with columns: id, rev, and content. Each id can have multiple revisions (rev). The task is to select one row per id that has the maximum rev.

For example, given the following data:

| id | rev | content |
|—-|—–|———|
| 1 | 1 | … |
| 2 | 1 | … |
| 1 | 2 | … |
| 1 | 3 | … |

The desired output is:

| id | rev | content |
|—-|—–|———|
| 1 | 3 | … |
| 2 | 1 | … |

Methods to Solve the Problem

Method 1: Using a Subquery with JOIN

This method involves creating a subquery to find the maximum rev for each id, and then joining this result back to the original table.

SELECT a.id, a.rev, a.content
FROM documents a
INNER JOIN (
    SELECT id, MAX(rev) AS max_rev
    FROM documents
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.max_rev;

Explanation:

  • The subquery (SELECT id, MAX(rev) AS max_rev FROM documents GROUP BY id) calculates the maximum rev for each id.
  • The main query joins this result back to the original table on both id and max_rev.

Method 2: Using a Self LEFT JOIN

This approach uses a self-join with a LEFT JOIN to filter out non-maximum rows.

SELECT a.*
FROM documents a
LEFT OUTER JOIN documents b
    ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

Explanation:

  • The table is joined with itself where the left side has a smaller rev than the right.
  • Rows that have no greater rev in the join result (i.e., b.id IS NULL) are those with the maximum rev.

Method 3: Using Window Functions

Window functions provide a powerful way to perform calculations across sets of rows related to the current row.

SELECT id, rev, content
FROM (
    SELECT id, rev, content,
           ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) AS rn
    FROM documents
) subquery
WHERE rn = 1;

Explanation:

  • ROW_NUMBER() assigns a unique number to each row within the partition of id, ordered by rev descending.
  • The outer query filters rows where rn equals 1, selecting only the maximum rev.

Method 4: Using Correlated Subquery

A correlated subquery can be used to find the maximum rev for each id directly.

SELECT yt.id, yt.rev, yt.content
FROM documents yt
WHERE rev = (
    SELECT MAX(st.rev)
    FROM documents st
    WHERE yt.id = st.id
);

Explanation:

  • For each row in the outer query, the subquery finds the maximum rev for that specific id.
  • The condition yt.rev = (subquery) ensures only rows with the maximum rev are selected.

Conclusion

Each method has its advantages and can be chosen based on readability, performance, or database compatibility. Understanding these techniques allows you to handle similar problems effectively in SQL, making your queries more efficient and maintainable.

Leave a Reply

Your email address will not be published. Required fields are marked *