Introduction
When analyzing time series data, it’s often useful to visualize multiple variables on a single graph. This can help identify trends and correlations between different datasets over time. In this tutorial, we will explore how to plot two variables as lines using ggplot2
, a powerful plotting library in R. We’ll cover both manual plotting and methods that involve reshaping data.
Prerequisites
Before starting, ensure you have the following installed:
- R: The programming language for statistical computing.
- ggplot2: A visualization package for creating complex plots from data in a data frame.
- tidyr or reshape2: Packages for data manipulation, especially reshaping data from wide to long format.
You can install these packages using the following commands:
install.packages("ggplot2")
install.packages("tidyr") # or "reshape2"
Creating Sample Data
Let’s create a sample dataset with two time series variables (var0
and var1
) and a date column.
test_data <- data.frame(
var0 = 100 + c(0, cumsum(runif(49, -20, 20))),
var1 = 150 + c(0, cumsum(runif(49, -10, 10))),
date = seq(as.Date("2002-01-01"), by="1 month", length.out=100)
)
Manual Plotting with ggplot2
For a straightforward approach, you can plot each variable manually using ggplot2
. This method allows you to specify aesthetics like color for differentiation.
library(ggplot2)
ggplot(test_data, aes(x = date)) +
geom_line(aes(y = var0, colour = "var0")) +
geom_line(aes(y = var1, colour = "var1")) +
scale_color_manual(values = c("var0" = "blue", "var1" = "red")) +
labs(title = "Time Series of var0 and var1",
x = "Date",
y = "Value",
color = "Variable")
This code snippet plots both var0
and var1
on the same graph, with different colors for each line. The scale_color_manual()
function is used to assign specific colors.
Reshaping Data
For more complex datasets or when you prefer a tidy data approach, reshaping your data into long format can simplify plotting multiple variables. This involves converting wide-format data (multiple columns) into tall-format data (a single column for values and another for variable names).
Using tidyr::pivot_longer
library(tidyr)
test_data_long <- pivot_longer(test_data, cols = starts_with("var"),
names_to = "variable", values_to = "value")
ggplot(data = test_data_long,
aes(x = date, y = value, colour = variable)) +
geom_line() +
labs(title = "Time Series of var0 and var1",
x = "Date",
y = "Value",
color = "Variable")
Using reshape2::melt
Alternatively, you can use the reshape2
package:
library(reshape2)
test_data_long <- melt(test_data, id.vars = "date",
variable.name = "variable", value.name = "value")
ggplot(data = test_data_long,
aes(x = date, y = value, colour = variable)) +
geom_line() +
labs(title = "Time Series of var0 and var1",
x = "Date",
y = "Value",
color = "Variable")
Both methods produce a similar plot, with variable
as the legend indicating each line’s identity.
Conclusion
In this tutorial, we explored two main approaches to plotting multiple time series using ggplot2
: manual plotting and data reshaping. The choice between these methods depends on your specific needs and preferences for working with data in R. Whether you prefer a straightforward approach or a tidy data philosophy, ggplot2
offers the flexibility to create informative and aesthetically pleasing visualizations.