library(kableExtra)
library(readxl)
library(janitor)
library(tidyverse)
library(lubridate)
library(scales)
library(viridis)
Solution: Data Assignment #2 (due November 3, 2023 by 11:59pm)
Canada is an energy exporting country.
The Canadian Energy Regulator produces pipeline profiles for the largest oil and gas pipelines in Canada, and also tracks other export-related data. The pipeline throughput and capacity data are available here. Crude oil export data sets are available here, while gas trade data are available here.
The US EIA also tracks crude oil imports from Canada and elsewhere, and often provides much more detail than the Canadian export data. Unfortunately, their new Application Programming Interface (API) is very cumbersome and so I’ve skipped teaching you that step and downloaded some data from their API for you to use for this assignment.
There are five deliverables for this assignment, and they build on skills we have already learned:
- a graph of throughput by grade and capacity on the Enbridge mainline ex-Gretna;
- a graph of throughput by grade and capacity on the Keystone pipeline at the US border;
- a graph of throughput by grade and destination on the TransMountain pipeline;
- a graph of Canadian crude oil exports by rail;
- a graph of US imports by origin (Canada vs Rest of the World) and refining region (PADDs 1 through 5).
To execute my versions of the deliverables that you see below, I’ve used the following packages. Use this as a guide to set up your document:
You may also find it useful to have a look at the Functions Demo before you start this assignment.
Deliverable 1: the Enbridge Mainline
The Enbridge Mainline moves more crude oil than all other systems combined. For this graph, I expect you to download data from the pipeline profiles and produce a graph in R that shows exports ex-Gretna by grade as well as capacity at that point. You’ll need to filter the data to make it useful for your graph, and you’ll need to combine an area plot and a line plot.
There are a couple of ways to make this graph. I took you in a slightly roundabout way that showed you you could graph two separate data sets on the same graph.
<-get_pipe_data()%>%
enb_datafilter(key_point=="ex-Gretna",direction_of_flow=="east") %>% mutate(
trade_type=str_to_title(trade_type) ,
product=gsub("domestic light / ngl", "Light / NGL", product),
product=gsub("domestic heavy", "Domestic Heavy", product),
product=gsub("foreign light", "Foreign Light", product),
pair=paste(product," (",trade_type,")",sep=""))
<-enb_data%>%group_by(date)%>%summarize(capacity=mean(available_capacity_1000_m3_d,na.rm=T))
enb_capacity
ggplot(enb_data) +
geom_area(aes(date,throughput_1000_m3_d,group = pair,fill=pair),position = "stack",color="black",linewidth=.25) +
geom_line(data=enb_capacity,aes(date,capacity,color="Available Capacity"),linewidth=.85, lty="21") +
scale_color_manual("",values=c("black"))+
scale_y_continuous(expand = c(0, 0),
sec.axis = sec_axis(trans=~.*1/.16, name="Shipments (Monthly, Thousands of Barrels per Day)")) +
scale_fill_manual("",values=viridis(n=4,alpha=1,begin=.7,end=0,option = "E",direction=-1))+
scale_x_date(name=NULL,date_breaks = "2 years", date_labels = "%b\n%Y",expand=c(0,0)) +
guides(#alpha = guide_legend(override.aes = list(fill=viridis(n=3,alpha=1,begin=.8,end=0,option = "E",direction=-1)[1]),order = 10) ,
fill = guide_legend(order = 1) )+
theme_classic() +
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray",linetype="dotted"),
axis.line.x = element_line(color = "gray"),
axis.line.y = element_line(color = "gray"),
axis.text = element_text(size = 12),
axis.text.x = element_text(margin = margin(t = 10)),
axis.title = element_text(size = 12),
#axis.label.x = element_text(size=20,vjust=+5),
plot.subtitle = element_text(size = 12,hjust=0.5),
plot.caption = element_text(face="italic",size = 12,hjust=0),
legend.key.width=unit(2,"line"),
legend.position = "bottom",
legend.box = "horizontal",
#legend.direction = "horizontal",
#legend.box = "horizontal",
legend.text = element_text(size = 12),
plot.title = element_text(hjust=0.5,size = 14))+
labs(y="Shipments (Monthly, Thousands of Cubic Metres per Day)",x="Date",
title=paste("Enbridge Canadian Mainline Shipments by Product",sep=""),
caption="Source: CER Data for Enbridge Mainline (ex-Gretna), graph by Andrew Leach.")
But, the data format actually also lends itself to this, as I realized while helping some of you out. Sometimes even I do things in an inefficient way without really thinking it through. Either way is fine.
ggplot(enb_data) +
geom_area(aes(date,throughput_1000_m3_d,group = pair,fill=pair),position = "stack",color="black",linewidth=.25) +
geom_line(aes(date,available_capacity_1000_m3_d,color="Available Capacity"),linewidth=.85, lty="21") +
scale_fill_manual("",values=viridis(n=4,alpha=1,begin=.7,end=0,option = "E",direction=-1))+
scale_x_date(name=NULL,date_breaks = "2 years", date_labels = "%b\n%Y",expand=c(0,0)) +
scale_color_manual("",values=c("black"))+
scale_y_continuous(expand = c(0, 0),
sec.axis = sec_axis( trans=~.*1/.16, name="Shipments (Monthly, Thousands of Barrels per Day)")) +
guides(alpha = guide_legend(override.aes = list(fill=viridis(n=3,alpha=1,begin=.8,end=0,option = "E",direction=-1)[1]),order = 10) ,
fill = guide_legend(order = 1) )+
theme_classic() +
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray",linetype="dotted"),
axis.line.x = element_line(color = "gray"),
axis.line.y = element_line(color = "gray"),
axis.text = element_text(size = 12),
axis.text.x = element_text(margin = margin(t = 10)),
axis.title = element_text(size = 12),
#axis.label.x = element_text(size=20,vjust=+5),
plot.subtitle = element_text(size = 12,hjust=0.5),
plot.caption = element_text(face="italic",size = 12,hjust=0),
legend.key.width=unit(2,"line"),
legend.position = "bottom",
legend.box = "horizontal",
#legend.direction = "horizontal",
#legend.box = "horizontal",
legend.text = element_text(size = 12),
plot.title = element_text(hjust=0.5,size = 14))+
labs(y="Shipments (Monthly, Thousands of Cubic Metres per Day)",x="Date",
title=paste("Enbridge Canadian Mainline Shipments by Product",sep=""),
caption="Source: CER Data for Enbridge Mainline (ex-Gretna), graph by Andrew Leach.")
Make sure to take note that, for a time before the opening of the Dakota Access Pipeline, there was some US-produced crude moving east on the Mainline.
Deliverable 2: the Keystone Pipeline
The Keystone system was supposed to grow to rival the Enbridge Mainline, but it stalled after the construction of a single cross-border pipeline. There was, at one point, talk of two pairs of lines (a twinned base Keystone, and a twinned line on the Keystone XL right of way). Today, though, the single line still moves a lot of heavy crude oil to refining locations in US PADDs 2 and 3.
This graph should have been an easier version of the first one, and you didn’t need to do capacity:
<-get_pipe_data(pipe_name="Keystone")%>%filter(direction_of_flow=="south",product!="total") %>% mutate(
keystone_datatrade_type=str_to_title(trade_type) ,
product=str_to_title(product),
date=mdy(date)
)
ggplot(keystone_data,aes(date,throughput_1000_m3_d,group = product,fill=product)) +
geom_area(position = "stack",color="black",linewidth=.25) +
scale_fill_manual("",values=viridis(n=3,alpha=1,begin=.7,end=0,option = "E",direction=-1))+
scale_x_date(name=NULL,date_breaks = "2 years", date_labels = "%b\n%Y",expand=c(0,0)) +
scale_y_continuous(expand = c(0, 0),
sec.axis = sec_axis( trans=~.*1/.16, name="Shipments (Monthly, Thousands of Barrels per Day)")) +
guides(alpha = guide_legend(override.aes = list(fill=viridis(n=3,alpha=1,begin=.8,end=0,option = "E",direction=-1)[1]),order = 10) ,
fill = guide_legend(order = 1) )+
theme_classic() +
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray",linetype="dotted"),
axis.line.x = element_line(color = "gray"),
axis.line.y = element_line(color = "gray"),
axis.text = element_text(size = 12),
axis.text.x = element_text(margin = margin(t = 10)),
axis.title = element_text(size = 12),
#axis.label.x = element_text(size=20,vjust=+5),
plot.subtitle = element_text(size = 12,hjust=0.5),
plot.caption = element_text(face="italic",size = 12,hjust=0),
legend.key.width=unit(2,"line"),
legend.position = "bottom",
legend.box = "vertical",
#legend.direction = "horizontal",
#legend.box = "horizontal",
legend.text = element_text(size = 12),
plot.title = element_text(hjust=0.5,size = 14))+
labs(y="Shipments (Monthly, Thousands of Cubic Metres per Day)",x="Date",
title=paste("Keystone Pipeline Shipments by Product",sep=""),
caption="Source: CER Data for Keystone (Canada-US border), graph by Andrew Leach.")
The CER didn’t help me out with this one, since they changed their data format AFTER I made the assignment, but before it was due. So, even my master file threw two errors: they changed the date so it reads in as a character vector, and they added a total throughput which had never been in those data before last month!
Deliverable 3: the Trans-Mountain Pipeline
The Trans-Mountain pipeline has been around since the 1950s and is the only major crude oil pipeline to cross the continental divide. Trans-Mountain is also unique in that it ships both crude oil and refined products in batches (other pipelines ship in batches too, just not with refined products in the mix). The following description is taken from the Trans-Mountain website:
Transmountain also has three delivery points: exports to the US via Sumas, deliveries to domestic refining in Burnaby, and a port delivery point at Westridge. The batches, the multiple delivery points, and the shipments of refined products are all part of what I want you to remember from this.
To see what gets delivered where on TransMountain, I had you produce this for TransMountain:
<-get_pipe_data(pipe_name="Trans-Mountain")%>% filter(key_point!="system")%>%mutate(
tm_datatrade_type=str_to_title(trade_type) ,
product=str_to_title(product),
pair=paste(key_point," (",trade_type,")",sep=""))
ggplot(tm_data,aes(date,throughput_1000_m3_d,group = product,fill=product)) +
geom_area(position = "stack",,color="black",linewidth=.25) +
scale_fill_manual("",values=viridis(n=4,alpha=1,begin=.7,end=0,option = "E",direction=-1))+
scale_x_date(name=NULL,date_breaks = "2 years", date_labels = "%b\n%Y",expand=c(0,0)) +
scale_y_continuous(expand = c(0, 0),
sec.axis = sec_axis( trans=~.*1/.16, name="Shipments (Monthly, Thousands of Barrels per Day)")) +
guides(alpha = guide_legend(override.aes = list(fill=viridis(n=3,alpha=1,begin=.8,end=0,option = "E",direction=-1)[1]),order = 10) ,
fill = guide_legend(order = 1) )+
facet_wrap(~pair,ncol = 1,scales="free_y")+
theme_classic() +
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray",linetype="dotted"),
axis.line.x = element_line(color = "gray"),
axis.line.y = element_line(color = "gray"),
axis.text = element_text(size = 12),
axis.text.x = element_text(margin = margin(t = 10)),
axis.title = element_text(size = 12),
#axis.label.x = element_text(size=20,vjust=+5),
plot.subtitle = element_text(size = 12,hjust=0.5),
plot.caption = element_text(face="italic",size = 12,hjust=0),
legend.key.width=unit(2,"line"),
legend.position = "bottom",
legend.box = "vertical",
#legend.direction = "horizontal",
#legend.box = "horizontal",
legend.text = element_text(size = 12),
plot.title = element_text(hjust=0.5,size = 14))+
labs(y="Shipments (Monthly, Thousands of Cubic Metres per Day)",x="Date",
title=paste("Trans-Mountain Pipeline Shipments by Product and Destination",sep=""),
caption="Source: CER Data for Trans-Mountain, graph by Andrew Leach.")
Hint: use
facet_wrap(~key_point,ncol = 1, scales = "free_y")
to allow different y-axes on each of the plots, and to stack the plots vertically if you like. You are not required to present them this way, but I do want to see a three-facet plot
Deliverable 4: Canadian exports of crude by rail
This one should have been be a bit easier - it’s a single-variable graph with no filters or extensive modifications.
The CER crude-by-rail data are available in an Excel workbook here.
You were given one bit of code:
#read data in from the excel using read_excel
<-read_excel("crude_by_rail.xlsx", sheet = NULL, range = NULL, col_names = TRUE,
oil_by_railcol_types = NULL, na = "", trim_ws = TRUE, skip = 7)
#fix the data format
<- oil_by_rail %>% clean_names() %>% select(-1)%>%slice(1:(n()-5))%>% #drop first column and the last 5 rows
oil_by_rail fill(year) %>% #fill in the missing years
mutate(mth_num=match(month,month.name),#create numeric months, and then make dates
date=ymd(paste(year,"-",mth_num,"-1",sep = "")))
And here’s how I made my plot:
#crude_by_rail_exports
download.file("https://www.cer-rec.gc.ca/en/data-analysis/energy-commodities/crude-oil-petroleum-products/statistics/canadian-crude-oil-exports-rail-monthly-data.xlsx", destfile="crude_by_rail.xlsx", mode = "wb")
<-read_excel("crude_by_rail.xlsx", sheet = NULL, range = NULL, col_names = TRUE,
oil_by_railcol_types = NULL, na = "", trim_ws = TRUE, skip = 7)
<- oil_by_rail %>% clean_names() %>% select(-1)%>%slice(1:(n()-5))%>% #drop first column and the last 5 rows
oil_by_rail fill(year) %>% #fill in the missing years
mutate(mth_num=match(month,month.name),#create numeric months, and then make dates
date=ymd(paste(year,"-",mth_num,"-1",sep = ""))
)
<-
rail_exportsggplot(oil_by_rail) +
geom_line(aes(date,volume_m3_per_day/1000,color="Crude by Rail Exports"),linewidth=1.75) +
scale_y_continuous(breaks=pretty_breaks(),expand = c(0, 0),
sec.axis = sec_axis( trans=~.*1/.16, name="Exports (Monthly, Thousands of Barrels per Day)\n")) +
expand_limits(y=0)+
scale_x_date(name=NULL,date_breaks="6 months",date_labels = "%b\n%Y",expand=c(0,0)) +
scale_color_manual("",values = colors_ua10(),guide = "legend")+
theme_classic() +
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray",linetype="dotted"),
axis.line.x = element_line(color = "gray"),
axis.line.y = element_line(color = "gray"),
axis.text = element_text(size = 12),
axis.text.x = element_text(margin = margin(t = 10)),
axis.title = element_text(size = 12),
#axis.label.x = element_text(size=20,vjust=+5),
plot.subtitle = element_text(size = 12,hjust=0.5),
plot.caption = element_text(face="italic",size = 12,hjust=0),
legend.key.width=unit(2,"line"),
legend.position = "none",
legend.box = "vertical",
#legend.direction = "horizontal",
#legend.box = "horizontal",
legend.text = element_text(size = 12),
plot.title = element_text(hjust=0.5,size = 14))+
theme(text = element_text(size=20))+
#t, r, b, l (To remember order, think trouble)
theme(plot.margin = unit(c(1,1,0.2,1), "cm"))+
labs(y="Exports (Monthly, Thousands of Cubic Meters per Day)\n",x="Date",
title=paste("Canadian Oil Exports by Rail",sep=""),
caption="Source: CER data, graph by Andrew Leach.")
Deliverable 5: US imports of Canadian crude
This one was the challenging graph of the set. I like it because it lets you see how to mak something we’ve used in class. We haven’t talked a lot about the US market, but it’s worth having a sense of how the US relies on imports. Using data that I’ve provided for you here, you should be able to reproduce a graph that looks like this one for use imports by grade and refining region. Note that refining regions give you a better sense of where demand lies, rather than a focus on the import port or pipeline border crossing.
This was harder for some of you because there were a lot of things going on. Below, I’ve shown you how I made both the assignment version and the class version using the same code, but with a little twist using factors.
<-read_csv("assignment_2_data.csv")
assignment_data_conf
<-
import_plotggplot(
%>%
assignment_data_conf#mutate(origin_name=as_factor(origin_name))%>%
#mutate(origin_name=fct_relevel(origin_name,"ROW"))%>%
#mutate(origin_name=fct_recode(origin_name,"Rest of the World"="ROW")
mutate(origin_name=gsub("ROW","Rest of the World",origin_name)
+
))geom_area(aes(period,quantity,group=origin_name,fill=origin_name),color="black",linewidth=0.25,position="stack")+
facet_grid(cols=vars(destination_name),rows=vars(grade_name))+
scale_x_date(expand=c(0,0))+
scale_y_continuous(breaks=pretty_breaks(), expand=c(0,0))+
#look here, adding in the facet wrap and the labeler
expand_limits(y=c(0,140))+
#scale_fill_manual("",values=c("darkblue","red"))+
scale_fill_manual("",values=c("red","darkblue"))+
theme_classic() +
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray",linetype="dotted"),
axis.line.x = element_line(color = "gray"),
axis.line.y = element_line(color = "gray"),
axis.text = element_text(size = 12),
axis.text.x = element_text(margin = margin(t = 10)),
axis.title = element_text(size = 12),
#axis.label.x = element_text(size=20,vjust=+5),
plot.subtitle = element_text(size = 12,hjust=0.5),
plot.caption = element_text(face="italic",size = 12,hjust=0),
legend.key.width=unit(2,"line"),
legend.position = "bottom",
legend.box = "vertical",
#legend.direction = "horizontal",
#legend.box = "horizontal",
legend.text = element_text(size = 12),
plot.title = element_text(hjust=0.5,size = 14))+
theme(text = element_text(size=16))+
theme(legend.position = "bottom", legend.box = "vertical",
legend.margin=margin(t = 0,b=0, unit='cm'))+
#t, r, b, l (To remember order, think trouble)
theme(plot.margin = unit(c(1,1,0.2,1), "cm"))+
guides(color = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1),
linetype = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1))+
labs(y="Imports (thousands of barrels per day)",x="",
title="US Crude Imports from Canada and the Rest of the World (ROW)",
subtitle="Imports by Refining PADD and Grade",
caption = paste("Data via US Energy Information Administration, current to ",format(max(assignment_data_conf$period),"%B %Y"),". Graph by Andrew Leach.",sep="")
+
)NULL
<-
import_plot_factorsggplot(
%>%
assignment_data_confmutate(origin_name=as_factor(origin_name))%>%
mutate(origin_name=fct_relevel(origin_name,"ROW"))%>%
mutate(origin_name=fct_recode(origin_name,"Rest of the World"="ROW")
#mutate(origin_name=gsub("ROW","Rest of the World",origin_name)
+
))geom_area(aes(period,quantity,group=origin_name,fill=origin_name),color="black",linewidth=0.25,position="stack")+
scale_x_date(expand=c(0,0))+
scale_y_continuous(breaks=pretty_breaks(), expand=c(0,0))+
#look here, adding in the facet wrap and the labeler
facet_grid(cols=vars(destination_name),rows=vars(grade_name))+
expand_limits(y=c(0,140))+
scale_fill_manual("",values=c("darkblue","red"))+
#scale_fill_manual("",values=c("red","darkblue"))+
theme_classic() +
theme(panel.border = element_blank(),
panel.grid = element_blank(),
panel.grid.major.y = element_line(color = "gray",linetype="dotted"),
axis.line.x = element_line(color = "gray"),
axis.line.y = element_line(color = "gray"),
axis.text = element_text(size = 12),
axis.text.x = element_text(margin = margin(t = 10)),
axis.title = element_text(size = 12),
#axis.label.x = element_text(size=20,vjust=+5),
plot.subtitle = element_text(size = 12,hjust=0.5),
plot.caption = element_text(face="italic",size = 12,hjust=0),
legend.key.width=unit(2,"line"),
legend.position = "bottom",
legend.box = "vertical",
#legend.direction = "horizontal",
#legend.box = "horizontal",
legend.text = element_text(size = 12),
plot.title = element_text(hjust=0.5,size = 14))+
theme(text = element_text(size=16))+
theme(legend.position = "bottom", legend.box = "vertical",
legend.margin=margin(t = 0,b=0, unit='cm'))+
#t, r, b, l (To remember order, think trouble)
theme(plot.margin = unit(c(1,1,0.2,1), "cm"))+
guides(color = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1),
linetype = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1))+
labs(y="Imports (thousands of barrels per day)",x="",
title="US Crude Imports from Canada and the Rest of the World (ROW)",
subtitle="Imports by Refining PADD and Grade",
caption = paste("Data via US Energy Information Administration, current to ",format(max(assignment_data$period),"%B %Y"),". Graph by Andrew Leach.",sep="")
+
)NULL
ggsave("../../slides/oil_transp/import_plot.png",plot = import_plot_factors,dpi=400,width=14,height=7,bg="white")
RMD File and HTML/PDF Preparation
As before, use the basic RMD file to complete this (and future) assignments, just rename it assignment 2. You only need to submit the HTML.