Counting Unique Values in Pandas DataFrames

Understanding Value Counts in Pandas

Pandas is a powerful Python library for data manipulation and analysis. A common task when working with data is to count the number of unique values within a column (or series) of a DataFrame. This tutorial will guide you through various methods to achieve this, along with explanations and examples.

Counting Unique Values with `nunique()`

The simplest and most direct way to count unique values in a Pandas Series (a column in a DataFrame) is using the nunique() method. This method returns the number of distinct elements in the Series.

import pandas as pd

# Sample DataFrame
data = {'hID': [101, 102, 103, 101, 102, 104, 105, 101],
        'dID': [10, 11, 12, 10, 11, 10, 12, 10],
        'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
        'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']}
df = pd.DataFrame(data)

# Count unique values in the 'hID' column
unique_count = df['hID'].nunique()
print(unique_count)  # Output: 5

Counting Total and Non-Null Values with `count()` and `size`

While nunique() specifically counts distinct values, you might also need to count total or non-null values.

count(): This method counts the number of non-null values in a Series.
size: This attribute returns the total number of elements in the Series, including null values.

# Count non-null values in 'hID'
non_null_count = df['hID'].count()
print(non_null_count)  # Output: 8

# Get the total number of elements in 'hID' (same as size)
total_count = df['hID'].size
print(total_count) # Output: 8

Counting Unique Values Across Multiple Columns

To count the number of unique values for all columns in a DataFrame, you can use the nunique() method on the entire DataFrame.

# Count unique values in each column
unique_counts = df.nunique()
print(unique_counts)
# Output:
# dID    3
# hID    5
# mID    3
# uID    5
# dtype: int64

Applying Aggregation with `agg()`

The agg() (aggregate) method provides a flexible way to apply multiple functions to your data. You can use it to calculate the count, size, and number of unique values for one or more columns simultaneously.

# Aggregate count, size, and nunique for all columns
aggregated_counts = df.agg(['count', 'size', 'nunique'])
print(aggregated_counts)

# Output:
#       dID  hID  mID  uID
# count    8    8    8    8
# size     8    8    8    8
# nunique  3    5    3    5

Conditional Counting with Boolean Indexing and `agg()`

You can also count unique values based on specific conditions. This involves using boolean indexing to filter the DataFrame and then applying the nunique() method.

# Count unique 'hID' values where 'mID' is equal to 'A'
conditional_unique = df.loc[df['mID'] == 'A', 'hID'].nunique()
print(conditional_unique) # Output: 5

#Alternative with query
conditional_unique_query = df.query('mID == "A"')['hID'].nunique()
print(conditional_unique_query) # Output: 5

Understanding Value Frequencies with `value_counts()`

While nunique() tells you the number of unique values, value_counts() provides a more detailed breakdown of the frequency of each unique value.

# Get the frequency of each unique value in the 'hID' column
hID_counts = df['hID'].value_counts()
print(hID_counts)
# Output:
# 101    3
# 102    2
# 104    1
# 105    1
# 103    1
# Name: hID, dtype: int64

This provides a series where the index represents the unique values and the values represent the number of occurrences.

Understanding Value Counts in Pandas

Counting Unique Values with nunique()

Counting Total and Non-Null Values with count() and size

Counting Unique Values Across Multiple Columns

Applying Aggregation with agg()

Conditional Counting with Boolean Indexing and agg()

Understanding Value Frequencies with value_counts()

Leave a Reply Cancel reply

Counting Unique Values with `nunique()`

Counting Total and Non-Null Values with `count()` and `size`

Applying Aggregation with `agg()`

Conditional Counting with Boolean Indexing and `agg()`

Understanding Value Frequencies with `value_counts()`