Upsert Operations in SQL Server: Inserting or Updating Records Efficiently
In database applications, a common requirement is to either update an existing record if it exists, or insert a new record if it doesn’t. This operation is often referred to as an "upsert" (a combination of "update" and "insert"). This tutorial will cover several strategies for performing upsert operations in SQL Server, highlighting their performance implications and considerations for concurrency.
The Problem
Imagine a table MyTable
with a primary key KEY
and other data fields. We need to ensure that if a record with a specific KEY
already exists, we update its fields; otherwise, we insert a new record with the provided data. A naive approach might involve first checking for the existence of the record using a SELECT
statement, and then executing either an UPDATE
or INSERT
statement based on the result. However, this approach can be inefficient, especially under high concurrency.
Basic Implementation: IF EXISTS…ELSE
A straightforward way to implement an upsert is to use an IF EXISTS
clause in conjunction with an UPDATE
or INSERT
statement:
IF EXISTS (SELECT * FROM MyTable WHERE KEY = @key)
BEGIN
UPDATE MyTable
SET datafield1 = @datafield1, datafield2 = @datafield2
WHERE KEY = @key
END
ELSE
BEGIN
INSERT INTO MyTable (KEY, datafield1, datafield2)
VALUES (@key, @datafield1, @datafield2)
END
This method is easy to understand, but it involves potentially two separate operations (checking for existence and then either updating or inserting). Under heavy load, this can lead to contention and performance bottlenecks.
Optimizing with a Single Statement: Using @@ROWCOUNT
A more efficient approach is to perform the UPDATE
first and then check @@ROWCOUNT
, a system function that returns the number of rows affected by the last statement. If @@ROWCOUNT
is 0, it means the UPDATE
did not affect any rows, indicating that the record did not exist, and we can then proceed with an INSERT
.
UPDATE MyTable
SET datafield1 = @datafield1, datafield2 = @datafield2
WHERE KEY = @key;
IF @@ROWCOUNT = 0
BEGIN
INSERT INTO MyTable (KEY, datafield1, datafield2)
VALUES (@key, @datafield1, @datafield2);
END
This method reduces the number of operations to two in the best case (update only) and still avoids a separate SELECT
statement.
Concurrency Considerations: Locking and Transactions
When multiple threads or processes are attempting to perform upserts concurrently, it’s crucial to consider locking and transactions to prevent data corruption and ensure data integrity. The SERIALIZABLE
transaction isolation level, combined with appropriate locking hints like UPDLOCK
, can help to serialize access to the table and prevent race conditions.
BEGIN TRANSACTION;
UPDATE MyTable WITH (UPDLOCK, SERIALIZABLE)
SET datafield1 = @datafield1, datafield2 = @datafield2
WHERE KEY = @key;
IF @@ROWCOUNT = 0
BEGIN
INSERT INTO MyTable (KEY, datafield1, datafield2)
VALUES (@key, @datafield1, @datafield2);
END
COMMIT TRANSACTION;
The UPDLOCK
hint acquires an update lock on the row, preventing other transactions from modifying it until the current transaction completes. The SERIALIZABLE
isolation level ensures that transactions are executed in a serial order, preventing phantom reads and other concurrency issues.
Using the MERGE
Statement (SQL Server 2008 and later)
SQL Server 2008 introduced the MERGE
statement, which provides a more declarative way to perform upserts. It allows you to specify the conditions for matching rows and the actions to take when a match is found or not found.
MERGE MyTable AS target
USING (VALUES (@key, @datafield1, @datafield2)) AS source (KEY, datafield1, datafield2)
ON target.KEY = source.KEY
WHEN MATCHED THEN
UPDATE SET
datafield1 = source.datafield1,
datafield2 = source.datafield2
WHEN NOT MATCHED THEN
INSERT (KEY, datafield1, datafield2)
VALUES (source.KEY, source.datafield1, source.datafield2);
The MERGE
statement can be more efficient than the other methods because it can potentially perform the entire operation with a single I/O operation. However, it can also be more complex to understand and requires careful consideration of concurrency and locking. Be aware of potential issues and consider its behavior under different isolation levels.
Avoidance of Error-Based Upserts
While it’s possible to use TRY...CATCH
blocks to handle potential constraint violations during an INSERT
and then perform an UPDATE
in the CATCH
block, this approach is generally discouraged. Relying on exception handling for control flow can be significantly less efficient than using a proactive approach like the methods described above.