ECON 366 — Energy Economics
  • Syllabus
  • Schedule
  • Course Deliverables
  • Data Exercises

Data Assignment 1

For what you see below, I’ve used the following packages and settings. Use this as a guide to set up your document:

library(kableExtra)
library(readxl)
library(janitor)
library(tidyverse)
library(lubridate)
library(scales)
library(viridis)
library(ggthemes)

options(scipen = 999)

For this assignment, you’re going to use four data sets to make four graphs and a table using tools you’ve already used in the data exercises.

First, you’ll use the BP data that we used in the first data exercise to create a graph of oil production by region.

Next, you’ll use Government of Alberta oil price data which I have scraped for you and placed in a file to create a table and a graph of monthly average oil prices over time.

Next, you’ll use the Alberta Energy Regulator annual outlook report known as the ST98 (they aren’t great with catchy names) to graph three oil price scenarios.

The last deliverable on the assignment is more of a current events challenge. I want you to make a graph of Canadian oil exports by destination using these data from the Canadian International Merchandise Trade Web Application. But, to make your life a little easier, I have processed the data for you into a nicer file available for you here.

This assignment tests your skills in filtering, grouping, and presenting data in table and graph form. It should not require any R skills that you haven’t already used or for which hints aren’t given to you below.

You do not have to replicate the graphs exactly. Make comparable graphs with the same information that work for you. I will penalize you for typos, raw variable names in legends, etc.

There are five (5) graded deliverables in this assignment.

Deliverable 1

For the first deliverable, I want you to use the BP data that you used in the Week 1 data exercise (a different worksheet though) to create this graph.

#check to see if the file exists. The first line after the if will only run if it doesn't
bp_oil_data<-
read_excel("../bp_2025.xlsx",sheet = "Oil Production - barrels",range="A3:BI81")%>%clean_names()%>%
  rename("country"=1)%>%
  filter(!is.na(country))%>%
  pivot_longer(-country,names_to="year",values_to = "production")%>%
  mutate(year=as.numeric(gsub("x","",year)))%>%
  filter(country %in% c("Canada","US","OPEC", "Total World"))%>%
  mutate(country=gsub("Total","",country))

  #ggplot(bp_oil_data)+
  #geom_line(aes(year,production/1000,colour=country))
  bp_oil_data%>%
  ggplot()+ #make a graph
  geom_line(aes(year,production/1000,colour=country),linewidth=1.15)+
  scale_x_continuous(expand=c(0,0),breaks=pretty_breaks(10))+
  expand_limits(x=2026)+
  expand_limits(y=c(0,105))+
  scale_y_continuous(breaks=pretty_breaks(), expand=c(0,0))+
  scale_color_manual("",values=c("darkgreen","red","black","blue"))+
  theme_irpp(base_size=16)+
  theme(legend.position = "bottom",
        plot.margin = margin(t=5,r=15,b=5,l=5))+
  guides(linetype = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1))+
    labs(y="Oil Production (Million Barrels Per Day)",x="",
         title="Crude Oil Production",
         subtitle="World, Canada, US, and OPEC\n",
         caption = "Data via Statistical Review of World Energy, graph by Andrew Leach"
         )

Deliverable 2

The next deliverable uses the Government of Alberta oil price data to create a simple table based on filtered data. I would like you to produce table of monthly WTI and WCS prices for 2025. Note that, as my compiling this assignment document, December, 2025 data were not available.

  price_table<-
  read_csv("oil_prices.csv")%>%
  filter(year(date)==2025)%>%
  select(date,wti,wcs)%>%
     mutate(
        diff=format(wti-wcs,nsmall=2),
         wti=format(wti, nsmall = 2),
         wcs=format(wcs, nsmall = 2),
         )%>%
  rename(
         "WTI (USD/bbl)"=wti,
         "WCS (USD/bbl)"=wcs,
         "WTI-WCS (USD/bbl)"=diff,
         Date=date)%>%
  kbl(escape = FALSE,table.attr = "style='width:90%;'",digits=2,align=rep('c', 4)) %>%
  kable_styling(fixed_thead = T,bootstrap_options = c("hover", "condensed","responsive"),full_width = T)%>%
  add_header_above(header = c("West Texas Intermediate (WTI) and Western Canadian Select (WCS) prices and differential in US Dollars"=4))%>%
  I() 
West Texas Intermediate (WTI) and Western Canadian Select (WCS) prices and differential in US Dollars
Date WTI (USD/bbl) WCS (USD/bbl) WTI-WCS (USD/bbl)
2025-01-01 75.74 62.86 12.88
2025-02-01 71.53 59.07 12.46
2025-03-01 68.24 54.38 13.86
2025-04-01 63.54 50.83 12.71
2025-05-01 62.17 51.57 10.60
2025-06-01 68.17 58.22 9.95
2025-07-01 68.39 58.31 10.08
2025-08-01 64.86 53.70 11.16
2025-09-01 63.96 51.63 12.33
2025-10-01 60.89 48.62 12.27
2025-11-01 60.06 48.77 11.29
2025-12-01 57.97 46.45 11.52

I’ll give you a couple of hints for how to make this happen:

  1. Make sure you don’t give the file a different extension when you download it (e.g. don’t set your destfile="foo.xlsx" if you’re downloading a CSV file. Use destfile="foo.csv" and vice-versa);

  2. If you want to force a number to have two decimal places, you can use format(value, nsmall = 2), but remember that it will create a character string not a numerical value once you do that. Test it out: type format(20.2565, nsmall = 2) and you’ll get back “20.26”, a character string;

  3. I don’t want you to get caught up on styling your tables, so here are the three lines of code that I used to create the table itself above:

kbl(escape = FALSE,table.attr = "style='width:90%;'",digits=2,align=rep('c', 4)) %>%
  kable_styling(fixed_thead = T,bootstrap_options = c("hover", "condensed","responsive"),full_width = T)%>%
  add_header_above(header = c("West Texas Intermediate (WTI) and Western Canadian Select (WCS) prices in US Dollars"=4))

Deliverable 3

The third deliverable for this assignment should be easy if you got the first and second one. I want you to graph WTI and WCS prices from the full monthly historical data from the Government of Alberta:

  read_csv("oil_prices.csv")%>%
  select(date,wti,wcs)%>%
  rename("WTI"=wti,"WCS"=wcs)%>%
  pivot_longer(-date,names_to = "crude_stream",values_to = "price")%>%
  ggplot()+ #make a graph
  geom_line(aes(date,price,colour=crude_stream),linewidth=1.15)+
  scale_x_date(date_labels = "%b\n%Y",date_breaks = "5 years",expand=c(0,0))+
  scale_y_continuous(breaks=pretty_breaks(), expand=c(0,0))+
  expand_limits(y=c(0,150))+
  expand_limits(x=ymd("2025-01-31"))+
  scale_color_viridis("",discrete = T,option="mako",direction = -1,begin=.7,end = .4)+
  theme_irpp(base_size=16)+
  theme(legend.position = "bottom",
        plot.margin = margin(t=5,r=15,b=5,l=5))+
  guides(linetype = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1))+
    labs(y="WCS or WTI Prices ($US/bbl)",x="",
         title="Monthly Average Oil Prices (2005-2025)",
         subtitle="Western Canada Select (WCS) and West Texas Intermediate (WTI)\n",
         caption = "Data via Government of Alberta, graph by Andrew Leach"
         )

Deliverable 4

For the next part of this assignment, you’ll use the the XLSX file from the Prices and Capital Expenditures page of the ST98, and you’re going to replicate their Figure S1.3 on WCS oil pricing scenarios.

#we'll download the data - remember to use the mode="wb" so that you get the windows binary file
if(!file.exists("https://www.aer.ca/documents/sts/st98/2025/st98-2025-prices-and-capital-expenditure-data.xlsx"))
  download.file("https://www.aer.ca/documents/sts/st98/2025/st98-2025-prices-and-capital-expenditure-data.xlsx",destfile = "st98_prices.xlsx",mode="wb")

wcs_data<- read_excel("st98_prices.xlsx",sheet="Figures",range = "L30:T61",skip = 1)%>%clean_names()%>%slice(-1)%>%
  mutate(across(-1, as.numeric))%>%
  select(1:5)%>%
  pivot_longer(-year,names_to = "case",values_to="price" )%>%
  mutate(case=gsub("_case_\\d+$","",case),
         case=gsub("u_s_","",case),
         case=gsub("_base","",case),
         case=gsub("base_","",case),
         case=str_to_upper(case))
wcs_data%>%
  ggplot()+ #make a graph
  geom_line(aes(year,price,color=case,lty=case),linewidth=.85)+
  geom_line(data=wcs_data%>%filter(case=="BASE"),aes(year,price,color=case,lty=case),linewidth=.85)+
  scale_x_continuous(breaks=pretty_breaks(), expand=c(0,0))+
  scale_y_continuous(breaks=pretty_breaks(), expand=c(0,0))+
  theme_economist_white()+
  expand_limits(y=c(0,120))+
  expand_limits(x=2036)+
  scale_color_manual("",values=c("black","blue","firebrick","orange"))+
  scale_linetype_manual("",values=c("solid","11","11","22"))+
  theme_irpp(base_size = 16)+
  guides(linetype = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1))+
    labs(y="Price (US$/bbl)",x="",
         title="WCS Crude Oil Price Scenarios, 2025",
         caption = "Data via Alberta Energy Regulator, ST98, graph by Andrew Leach"
         )+
  NULL

I’ll give you a couple of hints for how to make this happen:

  1. I used gsub repeatedly to clean up the case names;

  2. I used expand_limits(y=c(0,120)) and expand_limits(x=2035) to make my axes look nice, and a scale_color_manual("",values=c("black","blue","firebrick","orange")) for the color palette and scale_linetype_manual(",values=c("solid","11","11","22")) for the lines. I’ve also used theme(legend.position = "bottom") for the position of the legend;

  3. You may need to change the margins around your plot so that things aren’t getting cut off. Use theme(plot.margin = unit(c(1,1,1,1), "cm")) and the four values are in order top, right, bottom, left (think trouble: t r b l);

  4. I used a second geom_line in my graph to make sure the Base line was a solid black line. I added geom_line(data=wcs_data%>%filter(case=="BASE"),aes(year,price,color=case,lty=case),linewidth=.85) to my plot.

You do not need to match things exactly. It’s entirely your choice as long as 1) I can see all four cases; 2) you have line patterns and colours that are different; and 3) your graph is legible and clean.

Deliverable 5

Canada has historically exported most of its oil to the US mid-continent. Create the graph below, and for full marks, tell me what has happened recently (i.e. late 2024 and through 2025) that has led discounts to be much lower than recent historical levels and where you see that change in the graph below.

You should be able to just answer this in text in your RMD.

read_csv("oil_trade_cimt.csv")%>%clean_names()%>%
ggplot()+
  geom_line(aes(period,value,colour=trade_route),size=1.3)+
  guides(colour=guide_legend(nrow = 1))+
  scale_x_date(expand = c(0,0),date_breaks = "2 years",date_labels = "%Y")+
  scale_y_continuous(expand = c(0,0),breaks=pretty_breaks(8))+
  expand_limits(x=ymd("2026-01-01"))+
  theme_irpp(base_size=16)+
  theme(legend.position = "bottom")+
  scale_colour_manual("",values=c(colors_tableau10()[2],colors_tableau10()[1]))+
  theme(axis.title.y=element_text(margin=margin(c(0,0.25,0,0),unit="cm")),legend.text = element_text(size=10),
        title=element_text(margin=margin(c(0,0,0.5,0),unit="cm")))+
  labs(y="Crude oil export value (billions of dollars per month)",x="",
       title="Canadian crude oil export value by trade route",
       caption="Data via Canadian International Merchandise Trade Database, graph by Andrew Leach")

RMD File and HTML/PDF Preparation

I have made you a basic RMD file for your use in completing this (and future) assignments. Remember that I expect you to share your prompts if you’re using AI models to assist you here.

Before you start making any changes to the markdown, test your ability to knit to html with this file, and make sure you can make an html file before you proceed.

For example, with the document open as in the image below, use the Knit button to create your html.

Content 2026 by Andrew Leach
All content licensed under a Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0)

 

Made with and Quarto
View the source at GitHub