Understanding Value Counts in Pandas
Pandas is a powerful Python library for data manipulation and analysis. A common task when working with data is to count the number of unique values within a column (or series) of a DataFrame. This tutorial will guide you through various methods to achieve this, along with explanations and examples.
Counting Unique Values with nunique()
The simplest and most direct way to count unique values in a Pandas Series (a column in a DataFrame) is using the nunique()
method. This method returns the number of distinct elements in the Series.
import pandas as pd
# Sample DataFrame
data = {'hID': [101, 102, 103, 101, 102, 104, 105, 101],
'dID': [10, 11, 12, 10, 11, 10, 12, 10],
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']}
df = pd.DataFrame(data)
# Count unique values in the 'hID' column
unique_count = df['hID'].nunique()
print(unique_count) # Output: 5
Counting Total and Non-Null Values with count()
and size
While nunique()
specifically counts distinct values, you might also need to count total or non-null values.
count()
: This method counts the number of non-null values in a Series.size
: This attribute returns the total number of elements in the Series, including null values.
# Count non-null values in 'hID'
non_null_count = df['hID'].count()
print(non_null_count) # Output: 8
# Get the total number of elements in 'hID' (same as size)
total_count = df['hID'].size
print(total_count) # Output: 8
Counting Unique Values Across Multiple Columns
To count the number of unique values for all columns in a DataFrame, you can use the nunique()
method on the entire DataFrame.
# Count unique values in each column
unique_counts = df.nunique()
print(unique_counts)
# Output:
# dID 3
# hID 5
# mID 3
# uID 5
# dtype: int64
Applying Aggregation with agg()
The agg()
(aggregate) method provides a flexible way to apply multiple functions to your data. You can use it to calculate the count, size, and number of unique values for one or more columns simultaneously.
# Aggregate count, size, and nunique for all columns
aggregated_counts = df.agg(['count', 'size', 'nunique'])
print(aggregated_counts)
# Output:
# dID hID mID uID
# count 8 8 8 8
# size 8 8 8 8
# nunique 3 5 3 5
Conditional Counting with Boolean Indexing and agg()
You can also count unique values based on specific conditions. This involves using boolean indexing to filter the DataFrame and then applying the nunique()
method.
# Count unique 'hID' values where 'mID' is equal to 'A'
conditional_unique = df.loc[df['mID'] == 'A', 'hID'].nunique()
print(conditional_unique) # Output: 5
#Alternative with query
conditional_unique_query = df.query('mID == "A"')['hID'].nunique()
print(conditional_unique_query) # Output: 5
Understanding Value Frequencies with value_counts()
While nunique()
tells you the number of unique values, value_counts()
provides a more detailed breakdown of the frequency of each unique value.
# Get the frequency of each unique value in the 'hID' column
hID_counts = df['hID'].value_counts()
print(hID_counts)
# Output:
# 101 3
# 102 2
# 104 1
# 105 1
# 103 1
# Name: hID, dtype: int64
This provides a series where the index represents the unique values and the values represent the number of occurrences.