Visualizing Multiple Time Series with ggplot2 in R

Introduction

When analyzing time series data, it’s often useful to visualize multiple variables on a single graph. This can help identify trends and correlations between different datasets over time. In this tutorial, we will explore how to plot two variables as lines using ggplot2, a powerful plotting library in R. We’ll cover both manual plotting and methods that involve reshaping data.

Prerequisites

Before starting, ensure you have the following installed:

  • R: The programming language for statistical computing.
  • ggplot2: A visualization package for creating complex plots from data in a data frame.
  • tidyr or reshape2: Packages for data manipulation, especially reshaping data from wide to long format.

You can install these packages using the following commands:

install.packages("ggplot2")
install.packages("tidyr") # or "reshape2"

Creating Sample Data

Let’s create a sample dataset with two time series variables (var0 and var1) and a date column.

test_data <- data.frame(
  var0 = 100 + c(0, cumsum(runif(49, -20, 20))),
  var1 = 150 + c(0, cumsum(runif(49, -10, 10))),
  date = seq(as.Date("2002-01-01"), by="1 month", length.out=100)
)

Manual Plotting with ggplot2

For a straightforward approach, you can plot each variable manually using ggplot2. This method allows you to specify aesthetics like color for differentiation.

library(ggplot2)

ggplot(test_data, aes(x = date)) + 
  geom_line(aes(y = var0, colour = "var0")) + 
  geom_line(aes(y = var1, colour = "var1")) +
  scale_color_manual(values = c("var0" = "blue", "var1" = "red")) +
  labs(title = "Time Series of var0 and var1",
       x = "Date",
       y = "Value",
       color = "Variable")

This code snippet plots both var0 and var1 on the same graph, with different colors for each line. The scale_color_manual() function is used to assign specific colors.

Reshaping Data

For more complex datasets or when you prefer a tidy data approach, reshaping your data into long format can simplify plotting multiple variables. This involves converting wide-format data (multiple columns) into tall-format data (a single column for values and another for variable names).

Using tidyr::pivot_longer

library(tidyr)

test_data_long <- pivot_longer(test_data, cols = starts_with("var"), 
                               names_to = "variable", values_to = "value")

ggplot(data = test_data_long,
       aes(x = date, y = value, colour = variable)) +
  geom_line() +
  labs(title = "Time Series of var0 and var1",
       x = "Date",
       y = "Value",
       color = "Variable")

Using reshape2::melt

Alternatively, you can use the reshape2 package:

library(reshape2)

test_data_long <- melt(test_data, id.vars = "date", 
                       variable.name = "variable", value.name = "value")

ggplot(data = test_data_long,
       aes(x = date, y = value, colour = variable)) +
  geom_line() +
  labs(title = "Time Series of var0 and var1",
       x = "Date",
       y = "Value",
       color = "Variable")

Both methods produce a similar plot, with variable as the legend indicating each line’s identity.

Conclusion

In this tutorial, we explored two main approaches to plotting multiple time series using ggplot2: manual plotting and data reshaping. The choice between these methods depends on your specific needs and preferences for working with data in R. Whether you prefer a straightforward approach or a tidy data philosophy, ggplot2 offers the flexibility to create informative and aesthetically pleasing visualizations.

Leave a Reply

Your email address will not be published. Required fields are marked *