library(kableExtra)
library(readxl)
library(janitor)
library(tidyverse)
library(lubridate)
library(scales)
library(viridis)
library(ggthemes)
options(scipen = 999)Data Assignment 1
For what you see below, I’ve used the following packages and settings. Use this as a guide to set up your document:
The Alberta Energy Regulator produces an annual outlook report known as their ST98 (they aren’t great with catchy names) which provides a forecast of Alberta energy production.
For the first part of your first data assignment, I’m going to ask you to produce a few graphs and tables based on publicly-available data from the ST98.
The last deliverable on the assignment is more of a current events challenge. I want you to make a graph of Canadian oil exports by destination using these data from the Canadian International Merchandise Trade Web Application. But, to make your life a little easier, I have processed the data for you into a nicer file available for you here.
This assignment tests your skills in filtering, grouping, and presenting data in table and graph form. It should not require any R skills that you haven’t already used or for which hints aren’t given to you below.
There are five (5) graded deliverables in this assignment.
I am going to ask you for four (4) specific R outputs in this assignment, each of which you can complete using coding techniques that you’ve already seen, with some minor modifications. Each of these is worth 2 points with a grading key as follows : 0 for not attempted, 1 for attempted with reasonable effort but not completed, 1.5 for satisfactory completion, 2 for excellent work.
Next, in addition to these specific visualization deliverables, which you can complete using code that we’ve already used during the term, I’m going to ask you to answer a question on the data. Here, again, I’ll grade you on the scale of 0 for not attempted, 1 for attempted with reasonable effort but not completed, 1.5 for a solid explanation with no errors, and 2 for compelling explanation of the data.
For the first parts of the assignment, you’ll use the the XLSX file from the Prices and Capital Expenditures page of the ST98, and you’re going to replicate their Figures S1.2 and S1.3 on oil pricing scenarios and historic monthly differentials.
#we'll download the data - remember to use the mode="wb" so that you get the windows binary file
if(!file.exists("https://www.aer.ca/documents/sts/st98/2025/st98-2025-prices-and-capital-expenditure-data.xlsx"))
download.file("https://www.aer.ca/documents/sts/st98/2025/st98-2025-prices-and-capital-expenditure-data.xlsx",destfile = "st98_prices.xlsx",mode="wb")
wcs_data<- read_excel("st98_prices.xlsx",sheet="Figures",range = "L30:T61",skip = 1)%>%clean_names()%>%slice(-1)%>%
mutate(across(-1, as.numeric))
diffs_data<- read_excel("st98_prices.xlsx",sheet="Figures",range = "W30:Z127",skip = 1)%>%clean_names()%>%slice(-1)%>%
mutate(wti=as.numeric(wti),wcs=as.numeric(wcs),wti_wcs_differential=as.numeric(wti_wcs_differential))Deliverable 1
The first deliverable for this assignment is a simple table, but based on filtered data. I would like you to do a table of monthly WTI and WCS prices for 2024. You might also want to know that you can very easily convert all columns but the first one in your data to numeric using the code mutate(across(-1, as.numeric)).
diffs_data %>% rename("date"=year)%>%
select(-4)%>%
mutate(year=year(date))%>%
filter(year>=2024)%>%
select(date,wti,wcs)%>%
mutate(wti=format(wti, nsmall = 2),
wcs=format(wcs, nsmall = 2))%>%
rename(
"WTI ($US/bbl)"=wti,
"WCS ($US/bbl)"=wcs,
Date=date)%>%
kbl(escape = FALSE,table.attr = "style='width:90%;'",digits=2,align=rep('c', 3)) %>%
kable_styling(fixed_thead = T,bootstrap_options = c("hover", "condensed","responsive"),full_width = T)%>%
add_header_above(header = c("West Texas Intermediate (WTI) and Western Canadian Select (WCS) prices in $US"=3))%>%
I() | Date | WTI ($US/bbl) | WCS ($US/bbl) |
|---|---|---|
| 2024-01-01 | 73.86 | 53.77 |
| 2024-02-01 | 76.61 | 57.83 |
| 2024-03-01 | 80.41 | 61.28 |
| 2024-04-01 | 84.39 | 68.65 |
| 2024-05-01 | 78.62 | 65.59 |
| 2024-06-01 | 78.70 | 66.83 |
| 2024-07-01 | 80.48 | 67.49 |
| 2024-08-01 | 75.43 | 61.37 |
| 2024-09-01 | 69.37 | 55.90 |
| 2024-10-01 | 71.56 | 57.86 |
| 2024-11-01 | 69.54 | 57.56 |
| 2024-12-01 | 69.70 | 57.76 |
I’ll give you a couple of hints for how to make this happen:
Make sure you don’t give the file a different extension when you download it (e.g. don’t set your
destfile="foo.xlsx"if you’re downloading a CSV file. Usedestfile="foo.csv"and vice-versa);If you want to force a number to have two decimal places, you can use
format(value, nsmall = 2), but remember that it will create a character string not a numerical value once you do that. Test it out: typeformat(20.2565, nsmall = 2)and you’ll get back “20.26”, a character string;I don’t want you to get caught up on styling your tables, so here are the three lines of code that I used to create the table itself above:
kbl(escape = FALSE,table.attr = "style='width:90%;'",digits=2,align=rep('c', 3)) %>%
kable_styling(fixed_thead = T,bootstrap_options = c("hover", "condensed","responsive"),full_width = T)%>%
add_header_above(header = c("West Texas Intermediate (WTI) and Western Canadian Select (WCS) prices in $US"=3))Deliverable 2
The second deliverable is to graph the series you just included in the table, but keeping all the monthly detail from 2017 onward.
diffs_data %>% select(1:3)%>%
rename(date=year)%>%
pivot_longer(-date,names_to="crude_stream",values_to = "price")%>%
mutate(crude_stream=gsub("wti","West Texas Intermediate (WTI)",crude_stream),date=as_date(date),
crude_stream=gsub("wcs","Western Canada Select (WCS)",crude_stream))%>%
filter(date>=max(date)-years(10))%>% #keep the last 5 years of data
#keep the last year of data
ggplot()+ #make a graph
geom_line(aes(date,price,group=crude_stream,color=crude_stream),linewidth=.65)+
scale_x_date(date_labels = "%b\n%Y",date_breaks = "6 months",expand=c(0,0))+
scale_y_continuous(breaks=pretty_breaks(), expand=c(0,0))+
expand_limits(y=c(0,120))+
expand_limits(x=ymd("2025-01-31"))+
scale_color_viridis("",discrete = T,option="mako",direction = -1,begin=.7,end = 0)+
theme_minimal()+
theme(legend.position = "bottom",
plot.margin = margin(t=5,r=15,b=5,l=5))+
guides(linetype = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1))+
labs(y="Price ($US/bbl)",x="",
title="Monthly Average Crude Oil Prices",
subtitle = "West Texas Intermediate (WTI) and Western Canada Select (WCS) prices (2017-2025)",
caption = "Data via AER ST98, graph by Andrew Leach"
)Deliverable 3
The third deliverable for this assignment is a graph of the AER scenarios for WCS oil prices.
wcs_data %>%
select(1:5)%>%
pivot_longer(-year,names_to = "case",values_to="price" )%>%
mutate(case=gsub("_case_\\d+$","",case),
case=gsub("u_s_","",case),
case=gsub("_base","",case),
case=gsub("base_","",case),
case=str_to_upper(case))%>%
ggplot()+ #make a graph
geom_line(aes(year,price,color=case,lty=case),linewidth=.85)+
scale_x_continuous(breaks=pretty_breaks(), expand=c(0,0))+
scale_y_continuous(breaks=pretty_breaks(), expand=c(0,0))+
theme_economist_white()+
expand_limits(y=c(0,120))+
expand_limits(x=2035)+
scale_color_manual("",values=c("black","blue","firebrick","orange"))+
scale_linetype_manual("",values=c("solid","11","11","22"))+
theme(legend.position = "bottom")+
#t, r, b, l (To remember order, think trouble)
theme(plot.margin = unit(c(1,1,1,1), "cm"))+
guides(linetype = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1))+
labs(y="Price (US$/bbl)",x="",
title="WCS Crude Oil Price Scenarios, 2025",
caption = "Data via Alberta Energy Regulator, ST98, graph by Andrew Leach"
)+
NULLI’ll give you a couple of hints for how to make this happen:
I used
gsubrepeatedly to clean up the case names;I used
expand_limits(y=c(0,120))andexpand_limits(x=2035)to make my axes look nice, and ascale_color_manual("",values=c("black","blue","firebrick","orange"))for the color palette andscale_linetype_manual(",values=c("solid","11","11","22"))for the lines. I’ve also usedtheme(legend.position = "bottom")for the position of the legend;You may need to change the margins around your plot so that things aren’t getting cut off. Use
theme(plot.margin = unit(c(1,1,1,1), "cm"))and the four values are in order top, right, bottom, left (think trouble: t r b l);I used
theme_economist_white()for my plot, but you can use whatever you like.
You do not need to match things exactly. It’s entirely your choice as long as 1) I can see all four cases; 2) you have line patterns and colours that are different; and 3) your graph is legible and clean.
Deliverable 4
The fourth deliverable for this assignment should be easy if you got the second one. I want you to graph WCS discounts from WTI for the monthly historical data:
diffs_data %>%
rename(date=year)%>%
#pivot_longer(-date,names_to="crude_stream",values_to = "price")%>%
mutate(date=as_date(date))%>%
#crude_stream=gsub("wcs","Western Canada Select (WCS)",crude_stream))%>%
#filter(date>=max(date)-years(10))%>% #keep the last 5 years of data
#keep the last year of data
ggplot()+ #make a graph
geom_line(aes(date,wti_wcs_differential,color="WCS Discount to WTI ($US/bbl)"),linewidth=.65)+
scale_x_date(date_labels = "%b\n%Y",date_breaks = "6 months",expand=c(0,0))+
scale_y_continuous(breaks=pretty_breaks(), expand=c(0,0))+
expand_limits(y=c(0,50))+
expand_limits(x=ymd("2025-01-31"))+
scale_color_viridis("",discrete = T,option="mako",direction = -1,begin=.7,end = 0)+
theme_minimal()+
theme(legend.position = "bottom",
plot.margin = margin(t=5,r=15,b=5,l=5))+
guides(linetype = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1))+
labs(y="WCS Discount to WTI ($US/bbl)",x="",
title="Monthly Average Canadian Heavy Sour Crude Oil Discount",
subtitle = "Western Canada Select (WCS) discount to West Texas Intermediate (WTI) (2017-2025)",
caption = "Data via AER ST98, graph by Andrew Leach"
)You should only need one hint for this one: in this case, I used color="WCS Discount to WTI ($US/bbl)" in my geom_line to get a nice, clean legend.
Deliverable 5
Canada has historically exported most of its oil to the US mid-continent. Create the graph below, and for full marks 1) Tell me what happened when discounts were really high in 2018-2019 and what actions the Government of Alberta took to mitigate issue. 2) Tell me what has happened recently (i.e. late 2024 and through 2025, not all on the graph) that has led discounts to be much lower than recent historical levels; and 3) Tell me where you see that change in the graph below.
Pipeline capacity constraints and refinery outages led to large differentials which led the government to curtail oil production in Alberta
Biggest change is in the destination of exports, where we see more exports to non-US markets via the West Coast.
read_csv("oil_trade_cimt.csv")%>%clean_names()%>%
ggplot()+
geom_area(aes(period,value,group=trade_route,fill=trade_route),position="stack")+
theme_economist_white()+#for fun, let's use the economist theme
guides(fill=guide_legend(nrow = 1))+
scale_x_date(expand = c(0,0),date_breaks = "2 years",date_labels = "%Y")+
theme(legend.position = "bottom")+
scale_fill_manual("",values=c(colors_tableau10()[2],colors_tableau10()[1]))+
theme(axis.title.y=element_text(margin=margin(c(0,0.25,0,0),unit="cm")),legend.text = element_text(size=10),
title=element_text(margin=margin(c(0,0,0.5,0),unit="cm")))+
labs(y="Crude oil export value (billions of dollars per month)",x="",
title="Canadian crude oil export value by trade route",
caption="Data via Canadian International Merchandise Trade Database, graph by Andrew Leach")Hint: You’re going to want something like geom_area(YOUR AES COMMAND,position="stack") to get the graph to work.
RMD File and HTML/PDF Preparation
I have made you a basic RMD file for your use in completing this (and future) assignments. Remember that I expect you to share your prompts if you’re using AI models to assist you here.
Before you start making any changes to the markdown, test your ability to knit to html with this file, and make sure you can make an html file before you proceed.
For example, with the document open as in the image below, use the Knit button to create your html.