ECON 366 — Energy Economics
  • Syllabus
  • Schedule
  • Course Deliverables
  • Data Exercises

Data Exercise, Week 2

This week, our Data Exercise focuses on another key souce of energy data, the US Energy Information Administration and, as ever, we’ll introduce a couple of new commands in R for you to learn.

I’d love to have you use the API from the EIA, but it’s a mess, so we’re going to use spreadsheet data again. In particular, we’re going to use the data on oil and refined product prices here, although this week I am going to let you work out how to download the linked excel spreadsheet and read the data into R.

By this point, you should be working to prepare an RMarkdown file that compiles to an html document. You’ll need that skill for the first data assignment, coming next week, so please make sure you’ve got it running. You can use week2.Rmd as your starting point. Knit it before you start playing with it, just so you see that it works, then make a clean version to use for this week’s exercise.

Don’t forget to include your basic R packages. We’ll add two new ones this week: scales for nicer graphs and kableExtra for nice tables. The order can matter in cases where two packages have commands with the same names. The last one you load will be the one that defines what that particular command means when you run it.

library(tidyverse)
library(readxl)
library(janitor)
library(scales)
library(kableExtra)

When downloading the data, you might want a couple of hints:

  • use sheet="Data 1" to get the WTI and Brent prices;
  • use skip=2 in your read_excel command to skip the first two lines of the spreadsheet.

So, if you’ve got that right, you should be able to generate a table that looks like this, showing that you’ve got two price series (WTI and Brent) and dates (I named by data price_data but you might have called it something else):

  price_data<-price_data %>% 
  rename(wti=2,brent=3)
price_data%>%#start with my price data
  tail(10)%>% #grab the last 10 observations and pass it to kbl
  #everything below here just makes a nice table
   kbl(escape = FALSE,table.attr = "style='width:40%;'") %>%
    kable_styling(fixed_thead = T,bootstrap_options = c("hover", "condensed","responsive"),full_width = T)%>%
  scroll_box(width = "1000px", height = "400px")%>%
      I() 
date wti brent
2024-12-30 71.7 74.2
2024-12-31 72.4 74.6
2025-01-02 73.8 76.1
2025-01-03 74.6 76.7
2025-01-06 74.3 77.3
2025-01-07 75.0 77.8
2025-01-08 74.0 77.4
2025-01-09 NA 78.4
2025-01-10 77.3 79.8
2025-01-13 79.6 82.7

You’ll notice that I also used a couple of new bits of code in there:

  • I used price_data<-price_data %>% rename(wti=2,brent=3) to rename columns by number;
  • I used tail(10) to send the last 10 rows of data to the kbl table.

Like last week, you’ll want to transform your data from wide, with the variables one in each column, to long with data stacked vertically and an indicator variable (date) in this case. Try it out.

If you did it right, and you used names_to = "crude_stream", values_to = "price" in your pivot_longer command, you should get data that look something like this:

price_data %>% tail(10)%>%
  kbl(escape = FALSE,table.attr = "style='width:40%;'") %>%
    kable_styling(fixed_thead = T,bootstrap_options = c("hover", "condensed","responsive"),full_width = T)%>%
  scroll_box(width = "1000px", height = "400px")%>%
      I() 
date crude_stream price
2025-01-07 wti 75.0
2025-01-07 brent 77.8
2025-01-08 wti 74.0
2025-01-08 brent 77.4
2025-01-09 wti NA
2025-01-09 brent 78.4
2025-01-10 wti 77.3
2025-01-10 brent 79.8
2025-01-13 wti 79.6
2025-01-13 brent 82.7

The first new skill we want to learn this week is making a graph with multiple data series. Here’s a basic example, with comments in the code:

  price_data %>%
    ggplot()+ #make a graph
  geom_line(aes(date,price,group=crude_stream,color=crude_stream),linewidth=1.25)

#the grouping tells ggplot that each different crude stream is a different line
# the colour tells ggplot to assign a different colour to each of the crude streams

Now we’re going to get to one of the more powerful aspects of R graphs - dealing with dates and times. First, check your data: use the command from janitor to describe_class of your date variable:

describe_class(price_data$date)
[1] "POSIXct, POSIXt"

The output tells you that you’re dealing with a POSIXct or POSIXt date-time class variable - this class allows you to store a lot of information including, as in your case, just a YMD date. So, since we’re dealing with date-time classes, we would need to modify our graph using the scale_x_datetime command. Or, we can convert it to just a date, which I think it easier since we won’t use time for this.

#notice here I am over-writing my price_data with the altered data set
price_data<-price_data %>% mutate(date=as_date(date))

describe_class(price_data$date)
[1] "Date"
#notice here I am just sending price_data through to the graph command, not altering price_data at all
  price_data %>%
    ggplot()+ #make a graph
  geom_line(aes(date,price,group=crude_stream,color=crude_stream),linewidth=1.25)+
  #the grouping tells ggplot that each different crude stream is a different line
  # the colour tells ggplot to assign a different colour to each of the crude streams
  scale_x_date(date_labels = "%Y",date_breaks = "4 years")

In this command, date_labels tells you how to label your dates and/or times while date_breaks tells you the frequency of your labels.

For example, if I wanted to graph crude prices over the last year with axis labels by month, I could do something like this:

  price_data %>% filter(date>=max(date)-years(1))%>% #keep the last year of data
    ggplot()+ #make a graph
  geom_line(aes(date,price,group=crude_stream,color=crude_stream),linewidth=1.25)+
#the grouping tells ggplot that each different crude stream is a different line
# the colour tells ggplot to assign a different colour to each of the crude streams
    scale_x_date(date_labels = "%b\n%Y",date_breaks = "2 months")+
# use labels every 2 months, with labels being short-month and 4-digit year on two lines
  NULL #just a placeholder

The graph is still ugly, but you get the point.

The different codes you might use to format your dates (the %Y and %b) are all listed here and the \n is a new line command.

Okay, now I’d like to see you try to fix a graph up using some of the code that we’ve used before, and create a graph showing the last 10 years of WTI and Brent crude oil prices.

Here’s my graph for this week:

I’ll give you a couple of hints for how to make this happen:

  • I used a colour-blind friendly palette from the viridis package (you’d need to install it to replicate this exactly) using the code scale_color_viridis("",discrete = T,option="mako",direction = -1,begin=.7,end = 0)
  • I stretched the y limits using expand_limits(y=c(0,150))
  • I expanded the x-axis right limit with expand_limits(x=ymd("2025-03-31"))). In this one, I take my text date, 2024-03-31 and convert it to a date object to match the dates in our data using ymd from the lubridate package.
  • I used theme_minimal()
  • I filtered out any NA data points so that I don’t get a choppy line.

Good luck!

Content 2025 by Andrew Leach
All content licensed under a Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0)

 

Made with and Quarto
View the source at GitHub