Introduction
When working with collections of objects in C#, you may encounter scenarios where you need to filter out duplicates based on specific properties. This is particularly useful when dealing with entities like Person
, each having an Id
and a Name
. LINQ (Language Integrated Query) offers powerful tools for querying collections, including methods to select distinct elements. In this tutorial, we will explore how to use LINQ to obtain distinct objects from a list based on one or more properties.
Basic Distinct Selection
Before diving into selecting by specific properties, let’s understand the basic usage of Distinct()
. This method removes duplicate items from a sequence but only works directly with simple types like integers or strings. For example:
List<int> numbers = new List<int> { 1, 2, 2, 3, 4 };
IEnumerable<int> distinctNumbers = numbers.Distinct();
In this case, distinctNumbers
will contain { 1, 2, 3, 4 }
.
Distinct by a Single Property
To achieve distinctness based on a property of objects in a list, we can use the GroupBy()
method combined with Select()
. This approach is particularly effective when dealing with complex types like custom classes.
Example: Distinct Persons by ID
Suppose you have a list of Person
objects and want to get distinct persons based on their Id
. Here’s how you can do it:
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
}
List<Person> people = new List<Person>
{
new Person { Id = 1, Name = "Test1" },
new Person { Id = 1, Name = "Test2" },
new Person { Id = 2, Name = "Test3" }
};
var distinctPeopleById = people
.GroupBy(p => p.Id)
.Select(g => g.First())
.ToList();
This code groups the Person
objects by their Id
, and then selects the first object from each group, ensuring that only one person per Id
is included in the result.
Distinct by Multiple Properties
To get distinct elements based on multiple properties, you can use anonymous types within the GroupBy()
method:
var distinctPeopleByIdAndName = people
.GroupBy(p => new { p.Id, p.Name })
.Select(g => g.First())
.ToList();
This approach groups by both Id
and Name
, ensuring uniqueness across these combined properties.
Using DistinctBy in .NET 6 and Later
Starting with .NET 6, the LINQ library includes a new method called DistinctBy()
, which simplifies the process of selecting distinct elements based on a key selector:
var distinctPersonsById = people.DistinctBy(p => p.Id).ToList();
This extension method directly supports specifying one or more properties to determine uniqueness, making your code cleaner and more intuitive.
Implementing DistinctBy with Custom Logic
If you’re working in an environment prior to .NET 6 or need custom behavior, you can implement a DistinctBy()
extension method:
public static class LinqExtensions
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
}
You can then use this extension method as follows:
var distinctPeopleById = people.DistinctBy(p => p.Id).ToList();
Conclusion
Selecting distinct elements from a list based on specific properties is a common requirement when working with collections in C#. By leveraging LINQ’s GroupBy()
, or the more recent DistinctBy()
method, you can efficiently filter your data to meet your application’s needs. Understanding these techniques allows for cleaner and more effective code, enabling robust data manipulation capabilities.