Splitting Comma-Separated Values into Rows Using SQL Server

Introduction

In relational databases, storing data in a normalized form is often preferred for efficiency and clarity. However, sometimes due to application requirements or legacy design, data may be stored as comma-separated values (CSV) within a single column. This tutorial will guide you on how to transform these CSV strings into individual rows using SQL Server. We’ll explore several methods suitable for different versions of SQL Server.

Understanding the Problem

Imagine a database table where one of its columns contains CSV strings:

| SomeID | OtherID | Data |
|———-|———|———-|
| abcdef-..| cdef123-… | 18,20,22 |
| abcdef-..| 4554a24-… | 17,19 |
| 987654-..| 12324a2-… | 13,19,20 |

The goal is to split the Data column into separate rows for each value. This transformation can be crucial when applying SQL operations like filtering or paging directly on the database side.

Method 1: Recursive Common Table Expressions (CTEs) in SQL Server

For versions prior to SQL Server 2016, you can use recursive CTEs to split CSV strings into individual rows:

WITH SplitData(SomeID, OtherID, DataItem, String) AS (
    SELECT 
        SomeID,
        OtherID,
        LEFT(String, CHARINDEX(',', String + ',') - 1),
        STUFF(String, 1, CHARINDEX(',', String + ','), '')
    FROM Testdata
    UNION ALL
    SELECT 
        SomeID,
        OtherID,
        LEFT(String, CHARINDEX(',', String + ',') - 1),
        STUFF(String, 1, CHARINDEX(',', String + ','), '')
    FROM SplitData
    WHERE String > ''
)
SELECT 
    SomeID,
    OtherID,
    DataItem
FROM SplitData
ORDER BY SomeID;

Explanation:

  • The CTE SplitData uses recursion to process each substring.
  • LEFT and STUFF functions are used to extract the first value and remove it from the string, respectively.
  • Recursion continues until no more values remain in the string.

Method 2: Using STRING_SPLIT Function (SQL Server 2016+)

For SQL Server 2016 or later, you can leverage the built-in STRING_SPLIT function for a simpler approach:

SELECT 
    OtherID, 
    value AS SplitData
FROM yourtable
CROSS APPLY STRING_SPLIT(Data, ',');

Benefits:

  • No need for complex logic or custom functions.
  • Efficient and straightforward to implement.

Method 3: Using XML for Older Versions

Another approach suitable for older versions of SQL Server involves converting the CSV into an XML format and then parsing it:

SELECT A.OtherID,  
       Split.a.value('.', 'VARCHAR(100)') AS Data  
FROM (
    SELECT OtherID,  
           CAST('<M>' + REPLACE(Data, ',', '</M><M>') + '</M>' AS XML) AS Data  
    FROM Table1
) AS A 
CROSS APPLY Data.nodes('/M') AS Split(a);

Explanation:

  • The CSV string is transformed into an XML format.
  • nodes method and XQuery are used to extract each element.

Method 4: Tally Table Approach

For those using even older versions, a tally table can be employed for splitting strings:

CREATE FUNCTION [dbo].[DelimitedSplit8K]
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN (
    WITH E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                   SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
         E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
         E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS (
        SELECT TOP (ISNULL(DATALENGTH(@pString),0)) 
               ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
        FROM E4
    ),
    cteStart(N1) AS (
        SELECT 1 UNION ALL
        SELECT t.N+1 
        FROM cteTally t 
        WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
    ),
    cteLen(N1,L1) AS (
        SELECT s.N1,
               ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
          FROM cteStart s
    )
    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
           Item       = SUBSTRING(@pString, l.N1, l.L1)
      FROM cteLen l
);

Usage:

  • This function uses a tally table to determine start positions and lengths of each substring.

Conclusion

Splitting CSV strings into individual rows can be achieved through various methods depending on your SQL Server version. Whether you use recursive CTEs, the STRING_SPLIT function in newer versions, XML parsing, or a tally table approach, these techniques allow efficient handling of data stored as comma-separated values in a relational database.

Best Practices

  • Normalization: Consider normalizing your database schema to avoid storing CSV strings when possible.
  • Performance: Evaluate performance implications for each method based on the size and frequency of queries.
  • Compatibility: Choose methods compatible with your SQL Server version and application requirements.

By understanding these techniques, you can effectively manage and query data stored in unconventional formats within SQL Server databases.

Leave a Reply

Your email address will not be published. Required fields are marked *