Combining Results with UNION and UNION ALL

In SQL, you often need to combine the results of two or more SELECT statements into a single result set. The UNION and UNION ALL operators are designed for this purpose. While they achieve a similar goal, there’s a crucial difference in how they handle duplicate rows, and therefore, in their performance.

Understanding the Basics

Both UNION and UNION ALL take two or more SELECT statements and concatenate their respective result sets. However, before combining the results, certain conditions must be met:

Column Count: Each SELECT statement must have the same number of columns.
Data Types: The corresponding columns in each SELECT statement must have compatible data types. While they don’t need to be identical, the database system must be able to implicitly convert them if necessary. For example, selecting an integer from one statement and a string from another might not be compatible unless implicit conversion is allowed.
Column Order: The order of columns in each SELECT statement should be the same, as the corresponding columns are combined based on their position.

UNION: Removing Duplicates

The UNION operator combines the results of two or more SELECT statements and automatically removes any duplicate rows from the final result set. This is achieved by performing a distinct operation on the combined results.

Here’s an example:

SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;

If both table1 and table2 contain a row with the same values in column1 and column2, that row will appear only once in the combined result set.

UNION ALL: Keeping All Rows

The UNION ALL operator, unlike UNION, does not remove duplicate rows. It simply concatenates the results of the SELECT statements without performing any distinct operation. This makes it faster than UNION, especially when dealing with large datasets where duplicate removal would be expensive.

Here’s the same example using UNION ALL:

SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2;

If both table1 and table2 contain the same row, that row will appear multiple times in the combined result set, once for each occurrence in the original tables.

Performance Considerations

UNION: The duplicate removal process in UNION involves sorting and comparing rows, which can be computationally expensive, especially for large datasets.
UNION ALL: UNION ALL is generally faster because it avoids the duplicate removal step. If you are certain that the SELECT statements will not produce duplicate rows, or if duplicates are acceptable in your application, UNION ALL is the preferred choice.

Example Scenario

Imagine you have two tables: Employees and Contractors. Both tables have columns for FirstName, LastName, and Email. You want to create a list of all people (employees and contractors) and their emails.

-- Employees Table (simplified)
CREATE TABLE Employees (
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Email VARCHAR(100)
);

-- Contractors Table (simplified)
CREATE TABLE Contractors (
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Email VARCHAR(100)
);

-- Combining the lists (removing duplicates)
SELECT FirstName, LastName, Email FROM Employees
UNION
SELECT FirstName, LastName, Email FROM Contractors;

-- Combining the lists (keeping duplicates)
SELECT FirstName, LastName, Email FROM Employees
UNION ALL
SELECT FirstName, LastName, Email FROM Contractors;

If a person exists in both tables, UNION will return them only once, while UNION ALL will return them twice.

Data Type Compatibility

While the examples use straightforward data types, be aware that complex data types like BLOBs (Binary Large Objects) or CLOBs (Character Large Objects) might have limitations with UNION in certain database systems. UNION ALL often provides broader compatibility in these cases.