Introduction
In relational databases, storing data in a normalized form is often preferred for efficiency and clarity. However, sometimes due to application requirements or legacy design, data may be stored as comma-separated values (CSV) within a single column. This tutorial will guide you on how to transform these CSV strings into individual rows using SQL Server. We’ll explore several methods suitable for different versions of SQL Server.
Understanding the Problem
Imagine a database table where one of its columns contains CSV strings:
| SomeID | OtherID | Data |
|———-|———|———-|
| abcdef-..| cdef123-… | 18,20,22 |
| abcdef-..| 4554a24-… | 17,19 |
| 987654-..| 12324a2-… | 13,19,20 |
The goal is to split the Data
column into separate rows for each value. This transformation can be crucial when applying SQL operations like filtering or paging directly on the database side.
Method 1: Recursive Common Table Expressions (CTEs) in SQL Server
For versions prior to SQL Server 2016, you can use recursive CTEs to split CSV strings into individual rows:
WITH SplitData(SomeID, OtherID, DataItem, String) AS (
SELECT
SomeID,
OtherID,
LEFT(String, CHARINDEX(',', String + ',') - 1),
STUFF(String, 1, CHARINDEX(',', String + ','), '')
FROM Testdata
UNION ALL
SELECT
SomeID,
OtherID,
LEFT(String, CHARINDEX(',', String + ',') - 1),
STUFF(String, 1, CHARINDEX(',', String + ','), '')
FROM SplitData
WHERE String > ''
)
SELECT
SomeID,
OtherID,
DataItem
FROM SplitData
ORDER BY SomeID;
Explanation:
- The CTE
SplitData
uses recursion to process each substring. LEFT
andSTUFF
functions are used to extract the first value and remove it from the string, respectively.- Recursion continues until no more values remain in the string.
Method 2: Using STRING_SPLIT Function (SQL Server 2016+)
For SQL Server 2016 or later, you can leverage the built-in STRING_SPLIT
function for a simpler approach:
SELECT
OtherID,
value AS SplitData
FROM yourtable
CROSS APPLY STRING_SPLIT(Data, ',');
Benefits:
- No need for complex logic or custom functions.
- Efficient and straightforward to implement.
Method 3: Using XML for Older Versions
Another approach suitable for older versions of SQL Server involves converting the CSV into an XML format and then parsing it:
SELECT A.OtherID,
Split.a.value('.', 'VARCHAR(100)') AS Data
FROM (
SELECT OtherID,
CAST('<M>' + REPLACE(Data, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM Table1
) AS A
CROSS APPLY Data.nodes('/M') AS Split(a);
Explanation:
- The CSV string is transformed into an XML format.
nodes
method and XQuery are used to extract each element.
Method 4: Tally Table Approach
For those using even older versions, a tally table can be employed for splitting strings:
CREATE FUNCTION [dbo].[DelimitedSplit8K]
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN (
WITH E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (
SELECT TOP (ISNULL(DATALENGTH(@pString),0))
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E4
),
cteStart(N1) AS (
SELECT 1 UNION ALL
SELECT t.N+1
FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
),
cteLen(N1,L1) AS (
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
);
Usage:
- This function uses a tally table to determine start positions and lengths of each substring.
Conclusion
Splitting CSV strings into individual rows can be achieved through various methods depending on your SQL Server version. Whether you use recursive CTEs, the STRING_SPLIT
function in newer versions, XML parsing, or a tally table approach, these techniques allow efficient handling of data stored as comma-separated values in a relational database.
Best Practices
- Normalization: Consider normalizing your database schema to avoid storing CSV strings when possible.
- Performance: Evaluate performance implications for each method based on the size and frequency of queries.
- Compatibility: Choose methods compatible with your SQL Server version and application requirements.
By understanding these techniques, you can effectively manage and query data stored in unconventional formats within SQL Server databases.