Using LINQ to Select Distinct Elements by Property Values

Introduction

When working with collections of objects in C#, you may encounter scenarios where you need to filter out duplicates based on specific properties. This is particularly useful when dealing with entities like Person, each having an Id and a Name. LINQ (Language Integrated Query) offers powerful tools for querying collections, including methods to select distinct elements. In this tutorial, we will explore how to use LINQ to obtain distinct objects from a list based on one or more properties.

Basic Distinct Selection

Before diving into selecting by specific properties, let’s understand the basic usage of Distinct(). This method removes duplicate items from a sequence but only works directly with simple types like integers or strings. For example:

List<int> numbers = new List<int> { 1, 2, 2, 3, 4 };
IEnumerable<int> distinctNumbers = numbers.Distinct();

In this case, distinctNumbers will contain { 1, 2, 3, 4 }.

Distinct by a Single Property

To achieve distinctness based on a property of objects in a list, we can use the GroupBy() method combined with Select(). This approach is particularly effective when dealing with complex types like custom classes.

Example: Distinct Persons by ID

Suppose you have a list of Person objects and want to get distinct persons based on their Id. Here’s how you can do it:

public class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
}

List<Person> people = new List<Person>
{
    new Person { Id = 1, Name = "Test1" },
    new Person { Id = 1, Name = "Test2" },
    new Person { Id = 2, Name = "Test3" }
};

var distinctPeopleById = people
    .GroupBy(p => p.Id)
    .Select(g => g.First())
    .ToList();

This code groups the Person objects by their Id, and then selects the first object from each group, ensuring that only one person per Id is included in the result.

Distinct by Multiple Properties

To get distinct elements based on multiple properties, you can use anonymous types within the GroupBy() method:

var distinctPeopleByIdAndName = people
    .GroupBy(p => new { p.Id, p.Name })
    .Select(g => g.First())
    .ToList();

This approach groups by both Id and Name, ensuring uniqueness across these combined properties.

Using DistinctBy in .NET 6 and Later

Starting with .NET 6, the LINQ library includes a new method called DistinctBy(), which simplifies the process of selecting distinct elements based on a key selector:

var distinctPersonsById = people.DistinctBy(p => p.Id).ToList();

This extension method directly supports specifying one or more properties to determine uniqueness, making your code cleaner and more intuitive.

Implementing DistinctBy with Custom Logic

If you’re working in an environment prior to .NET 6 or need custom behavior, you can implement a DistinctBy() extension method:

public static class LinqExtensions
{
    public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
        this IEnumerable<TSource> source,
        Func<TSource, TKey> keySelector)
    {
        HashSet<TKey> seenKeys = new HashSet<TKey>();
        foreach (TSource element in source)
        {
            if (seenKeys.Add(keySelector(element)))
            {
                yield return element;
            }
        }
    }
}

You can then use this extension method as follows:

var distinctPeopleById = people.DistinctBy(p => p.Id).ToList();

Conclusion

Selecting distinct elements from a list based on specific properties is a common requirement when working with collections in C#. By leveraging LINQ’s GroupBy(), or the more recent DistinctBy() method, you can efficiently filter your data to meet your application’s needs. Understanding these techniques allows for cleaner and more effective code, enabling robust data manipulation capabilities.

Leave a Reply

Your email address will not be published. Required fields are marked *