Upsert Operations in SQL Server: Inserting or Updating Records Efficiently
In database applications, a common requirement is to either update an existing record if it exists, or insert a new record if it doesn’t. This operation is often referred to as an "upsert" (a combination of "update" and "insert"). This tutorial will cover several strategies for performing upsert operations in SQL Server, highlighting their performance implications and considerations for concurrency.
The Problem
Imagine a table MyTable with a primary key KEY and other data fields. We need to ensure that if a record with a specific KEY already exists, we update its fields; otherwise, we insert a new record with the provided data. A naive approach might involve first checking for the existence of the record using a SELECT statement, and then executing either an UPDATE or INSERT statement based on the result. However, this approach can be inefficient, especially under high concurrency.
Basic Implementation: IF EXISTS…ELSE
A straightforward way to implement an upsert is to use an IF EXISTS clause in conjunction with an UPDATE or INSERT statement:
IF EXISTS (SELECT * FROM MyTable WHERE KEY = @key)
BEGIN
UPDATE MyTable
SET datafield1 = @datafield1, datafield2 = @datafield2
WHERE KEY = @key
END
ELSE
BEGIN
INSERT INTO MyTable (KEY, datafield1, datafield2)
VALUES (@key, @datafield1, @datafield2)
END
This method is easy to understand, but it involves potentially two separate operations (checking for existence and then either updating or inserting). Under heavy load, this can lead to contention and performance bottlenecks.
Optimizing with a Single Statement: Using @@ROWCOUNT
A more efficient approach is to perform the UPDATE first and then check @@ROWCOUNT, a system function that returns the number of rows affected by the last statement. If @@ROWCOUNT is 0, it means the UPDATE did not affect any rows, indicating that the record did not exist, and we can then proceed with an INSERT.
UPDATE MyTable
SET datafield1 = @datafield1, datafield2 = @datafield2
WHERE KEY = @key;
IF @@ROWCOUNT = 0
BEGIN
INSERT INTO MyTable (KEY, datafield1, datafield2)
VALUES (@key, @datafield1, @datafield2);
END
This method reduces the number of operations to two in the best case (update only) and still avoids a separate SELECT statement.
Concurrency Considerations: Locking and Transactions
When multiple threads or processes are attempting to perform upserts concurrently, it’s crucial to consider locking and transactions to prevent data corruption and ensure data integrity. The SERIALIZABLE transaction isolation level, combined with appropriate locking hints like UPDLOCK, can help to serialize access to the table and prevent race conditions.
BEGIN TRANSACTION;
UPDATE MyTable WITH (UPDLOCK, SERIALIZABLE)
SET datafield1 = @datafield1, datafield2 = @datafield2
WHERE KEY = @key;
IF @@ROWCOUNT = 0
BEGIN
INSERT INTO MyTable (KEY, datafield1, datafield2)
VALUES (@key, @datafield1, @datafield2);
END
COMMIT TRANSACTION;
The UPDLOCK hint acquires an update lock on the row, preventing other transactions from modifying it until the current transaction completes. The SERIALIZABLE isolation level ensures that transactions are executed in a serial order, preventing phantom reads and other concurrency issues.
Using the MERGE Statement (SQL Server 2008 and later)
SQL Server 2008 introduced the MERGE statement, which provides a more declarative way to perform upserts. It allows you to specify the conditions for matching rows and the actions to take when a match is found or not found.
MERGE MyTable AS target
USING (VALUES (@key, @datafield1, @datafield2)) AS source (KEY, datafield1, datafield2)
ON target.KEY = source.KEY
WHEN MATCHED THEN
UPDATE SET
datafield1 = source.datafield1,
datafield2 = source.datafield2
WHEN NOT MATCHED THEN
INSERT (KEY, datafield1, datafield2)
VALUES (source.KEY, source.datafield1, source.datafield2);
The MERGE statement can be more efficient than the other methods because it can potentially perform the entire operation with a single I/O operation. However, it can also be more complex to understand and requires careful consideration of concurrency and locking. Be aware of potential issues and consider its behavior under different isolation levels.
Avoidance of Error-Based Upserts
While it’s possible to use TRY...CATCH blocks to handle potential constraint violations during an INSERT and then perform an UPDATE in the CATCH block, this approach is generally discouraged. Relying on exception handling for control flow can be significantly less efficient than using a proactive approach like the methods described above.