#usual packages
library(kableExtra)
library(readxl)
library(janitor)
library(tidyverse)
library(lubridate)
library(scales)
library(viridis)
#new packages
library(cansim)
library(cowplot) #I'm using this for a demo in the code, and you may use it for Deliverable 4
library(gridExtra) #I use this a lot, and you may use it for Deliverable 4
Data Assignment #3 (due November 26, 2023 by 11:59pm)
As we come to the end of the term, our last two topics are electricity and climate change and this data assignment combines both of those subjects. I also want you to work towards being able to present really nice-looking markdown html documents including the suppression of some of the code, warnings, etc. from your final document. As such, part of the requirements for this exercise are for you to make use of chunk options for your r code to improve your presentation of data. We’ll also use cached data so that you don’t have to re-download data every time.
For this data assignment, you’re going to use a few different sources of data:
- Canada’s Greenhouse Gas Emissions Projections;
- Canada’s National Greenhouse Gas Emissions Inventory; and
- AESO data on microgeneration in Alberta.
There’s a challenge graph with a couple of new skills for you to master, and a bonus mark for it too!
Are you ready to get started?
Make sure you read all the instructions closely and make sure to comment your code and only add a bit at a time to your RMD files so that you can easily spot errors or the impacts of changes you’ve made.
To execute the output you see below, I’ve used the following packages. Use this as a guide to set up your document:
You may find it useful to have another look at the Functions Demo before you start this assignment.
A lot of the plots in here rely on different ways to order data stored as factors. Remember way back from the first data exercise, we looked at factors and a package called forcats
that lets you manipulate factors. We’re going to use this package in a couple of spots in this assignment.
Deliverable 1: Cleaning up your html (1 total mark)
The first thing I want to see from you this time is cleaner html files. I want you to use chunk options to set up your output to look much cleaner in the html you generate. You’ll want to have this explainer on chunk options and this more detailed version to hand. For default settings in my code, I include the following chunk right at the top of my document. Setting warnings and messages to false means that you don’t get all the red text mirrored into your html (you might want to give that first chunk {r chunk_opts,include=F}
options to suppress the output.
::opts_chunk$set(message=F,
knitrwarning=F)#turn off messages and warnings
options(scipen = 999) #suppress scientific notation
For individual chunks in my code, I use echo=FALSE
when I don’t want to show you the code, and include = TRUE
when I want to show you both code and output.
If you want to try something a little more fun, try code-folding in your html which lets you create little radio buttons to show or hide your code (you’ve seen me use this in a few things this term, like this document).
Another useful thing you can use for code chunks is caching. If, for example, you have a code chunk that downloads data, cache that chunk using cache=TRUE
in the chunk options and everything will run more quickly as you don’t have to download the data each time. You don’t have to do this, but you might want to know it’s an option.
And, finally, remember that you can use chunk options to modify figures. For example, chunk options as follows {r,out.width="95%",dpi=150,fig.align="center",fig.height=8,fig.width=14}
will set the width of the graph window, the resolution (dpi), the alignment and the dimensions of the figure you’re pushing into your html. Make sure your figures look good in your final html!
With all of this in hand, you should be able to make a nice, efficient html file! I’d like you to try just to show me the code you use in the assignment, but to suppress warnings and messages from the final product and to make sure all figures clear and complete in the html. (1 mark)
Deliverable 2: Electricity Generation by Source, Canada (3 total marks)
Canada’s official Greenhouse Gas Emissions Inventory, the emissions against which our national targets are measured, contains data on electricity generation, and it’s some of the best data we have at a national level. From the homepage, you should navigate to the open data site and download the electricity data by province (it should be one large excel file).
BEWARE: THERE ARE TWO VERSIONS OF THIS FILE ON THE OPEN DATA PAGE FOR SOME REASON. You want entry C or, alternatively you can get my downloaded file here.
If you look at the Excel file, you’ll see that it has a worksheet for Canada and for each province. You’ll also see that you need to do some data cleaning, which you can do in R to get the data you need, or you can do the manipulation in Excel and read in only the data to make this graph and the next one.
The first graph I’d like you to produce is one for Canada (2 marks).
Your graph doesn’t have to be a perfect replication of this, but it should be similar. You’ll notice a few things from this graph: the data are not in an even sequence by year, and I’ve converted the data from GWh to TWh.
A couple of hints:
- use
year
as a factor (i.e. useas_factor(year)
after you’ve converted the data into long form usingpivot_longer
) and then graph using the factor as your x axis. You will NOT be able to make this graph with scale_x_date, so don’t even try. You don’t have even intervals between years, so use factors and scale_x_discrete to make some changes if you need to do so; - use
geom_col()
the same way you would usegeom_area()
; - instead of using
value
in your graph, usevalue/1000
and you’ll get the different units; - if you’re getting scientific notation, add a line
options(scipen=999)
earlier in your r code; - if you want to tilt the x-axis labels, use
theme(axis.text.x=element_text(angle=45,vjust=1,hjust=1))
.
As a second deliverable in this section, I’d also like you to produce a similar graph for Alberta (1 mark).
Deliverable 3: Emissions from electricity generation (3 total marks, + 1 bonus mark available)
We’re going to talk a lot about emissions in the next while, so why don’t we get started now. Let’s look at how greenhouse gas emissions from electricity have evolved over time and by province. For this, we’ll use some data that we’re going to use a lot over the next couple of weeks: emissions inventories and data from Canada’s Fifth Biennial Report to the United Nations. Getting these data into R is a bit of a pain, so I’ll give you the data for this one in assignment_3_projections.csv.
If you want to see how I actually made these data for you, I’ve left the code in here (folded) for your reference.
Code
#Environment Canada Emissions Projection Data
<-"https://data-donnees.az.ec.gc.ca/api/file?path=/substances/monitor/canada-s-greenhouse-gas-emissions-projections/Current-Projections-Actuelles/GHG-GES/detailed_GHG_emissions_GES_detaillees.csv"
file_locdownload.file(file_loc,destfile="ec_projections_2023.csv",mode = "wb")
#2023 Projections Data
<-read.csv("ec_projections_2023.csv",skip = 0,na = "-",fileEncoding = "Latin1", check.names = F) %>%
proj_dataclean_names()%>%select(-c(2,4,6,8,10,12,14)) %>% #take out the french data
pivot_longer(-c(1:7),names_to = "year",values_to = "emissions")%>% #switch to long form
mutate(year=gsub("x","",year),emissions=gsub("-","0",emissions),year=as.numeric(year),emissions=as.numeric(emissions))
#clean the emissions data and the naming of the years
#reset names
names(proj_data)<-c("region","scenario","sector","subsector_level_1","subsector_level_2","subsector_level_3","unit","year","emissions")
#recode provinces
<-proj_data %>% filter(sector %in% c("Electricity"))%>%
proj_datamutate(prov=as.factor(region),
prov=fct_recode(prov,"AB"="Alberta",
"BC"="British Columbia",
"NL"="Newfoundland and Labrador",
"MB"="Manitoba",
"SK"="Saskatchewan",
"NS"="Nova Scotia",
"ON"="Ontario",
"NT"="Northwest Territory",
"NT"="Northwest Territories",
"QC"="Quebec",
"NU"="Nunavut",
"NB"="New Brunswick",
"YT"="Yukon Territory",
"YT"="Yukon",
"PE"="Prince Edward Island",
"NU"="Nunavut"
#group the territories and the atlantic provinces
), prov=fct_collapse(prov,
"TERR" = c("NT", "NU","YT","NT & NU")
"ATL" = c("NL", "NB","NS","PE")
,#,"OTHER ATL" = c("NL", "NB","PE")
%>% #sum up emissions within territories and atlantic provinces
))select(year,prov,region,scenario,emissions,sector)%>% group_by(year,prov,sector,scenario)%>%summarize(emissions=sum(emissions,na.rm = T)) %>%ungroup()%>%filter(prov!="TERR") #drop the territories here
write.csv(proj_data,"assignment_3_projections.csv",row.names = FALSE)
These data are in a format you’ll be used to seeing: a reference case and an additional scenario, and they also contain historic emissions inventory data in the NIR2022 scenario. In order to keep the provinces in order from west to east, I’d suggest you use something like this in your code to format your provinces into a factor with labels in order:
<-proj_data %>% #reorder factor levels
proj_datamutate(prov=factor(prov,
levels=c("Canada" ,"BC","AB" ,"SK","MB", "ON","QC","ATL","TERR" )))
You can also do this with fct_relevel
from forcats
:
<-proj_data %>% #reorder factor levels
proj_datamutate(prov=fct_relevel(prov,"Canada","BC","AB" ,"SK","MB", "ON","QC","ATL","TERR"))
I’d like you to make a couple of different graphs using these data.
First, I’d like you to graph national emissions stacked by province, using the national inventory (NIR 2022) data from the file (1 mark):
Second, I’d like you to produce a graph of the projections and emissions combined. You can choose whether to use the Reference Case or the Additional Measures Case, just make sure to label your graph appropriately. This is one of the more challenging graphs for this exercise. At a minimum, I’d like you to have a division line (use geom_vline()
) to split between projections and inventory data: (2 marks)
If you want to try to replicate mine with the more transparent fill for the projections, you can but you don’t have to do so.
If you want to try this, there are a number of ways to do this graph, but the most important thing to remember is that ggplot makes plots in layers, and this is basically a combination of two geom_area()
plots (the projection, plot A, and the inventory, plot B) graphed one on top of the other with different transparency (alpha) values.
So, you need two geom_area()
lines. For mine, I use something like this:
#start with the data filtered to include either the inventory for years up to 2020 and the projections for the later years
#I have a variable, project_case, that stores the name of the projection I am using
geom_area(aes(year,emissions,fill=prov),color="black",position = "stack",size=0.1,alpha=.4)+ #essentially plot A above
geom_area(data=filter(proj_data,emissions>0 & scenario%in% c(inventory,project_case) & prov !="Canada" & year<=2020 & sector=="Electricity"),
aes(year,emissions,fill=prov),color="black",position = "stack",size=0.1,alpha=.8)+ #the data in plot B above
If you get that to work, I’ll give you a bonus mark. (+ 1 Bonus)
Deliverable 4: Electricity microgeneration (3 total marks)
In Alberta, homes and businesses may install solar or other electricity generating equipment on their property, and these are regulated under the Micro-generation Regulation, Alta Reg 27/2008 which defines many of the conditions under which these sites operate. We’ll talk about these in more detail next week, but I thought it was worth giving you a sense of the number of sites and the total capacity that we’re talking about right now in Alberta.
So, for your last deliverable for this assignment, let’s talk about Alberta microgen. And, for this deliverable, we’re going to use one of two new libraries too: the grid.arrange
function in gridExtra
lets you arrange plots, you guessed it, on a grid, and the plot_grid
function in cowplot
which has some very similar features.
Let’s start with the data file: you can access it here.
It’s a pretty clean data file, so you probably won’t need to do much with it once you get it downloaded and read into R.
I want you to make and combine two graphs. The first graph is a total capacity graph by fuel type:
and the second is a graph of the number of migrogeneration sites by type:
To do this, you’ll need to combine variables to create the Other type and summarize your data so that you only have one Other data point for each month.
Hints: 1) use as_factor
to make microgen_fuel_type into a factor 2) use fct_other()
to keep only the Solar type and have all the others labelled as Other 3) use group_by
and summarize
to combine all you Other sites and capacity measures into a single Other totals for sites and capacity for each month
And, the last step is for you to arrange these two plots side-by-side. You can do this with gridExtra::grid.arrange()
:
Or, you can make the same combined plot with cowplot::plot_grid()
:
Aborted Deliverable 1: StatsCan Data Are Often Useless (No marks, just for reference)
The first thing I was going to do for this assignment when I set it last winter was to get you to use the cansim
package to download some Statistics Canada data. I hope you’ll see how this would have been useful for you. The cansim
package was written by Jens von Bergmann, a Vancouver-based mathematician who has done a lot of cool and useful things with R. You can read more about it here. For example, you should be able to use the get_cansim()
function in the cansim
library to access Canadian power generation data by province:
<-get_cansim("2510001501")%>%clean_names()%>% #read data, clean names
power_datamutate(ref_date=ym(ref_date))#fix dates
I wanted you to use these data to Deliverable 1, but you’ll soon see why I’m not going to do that. Instead, I’ll just show you what happens when you make a plot of Canada’s electricity generation mix by source using the data you’ve just downloaded from Statistics Canada (I left my code folded so you can see it):
Code
%>%filter(geo=="Canada",#keep Canada
power_data !="1",
hierarchy_for_type_of_electricity_generation!="1.11",
hierarchy_for_type_of_electricity_generation!="1.11.12",
hierarchy_for_type_of_electricity_generation!="1.11.13",
hierarchy_for_type_of_electricity_generation=="1"
hierarchy_for_class_of_electricity_producer#filter for the specific chunks of the data I want
%>%ggplot()+
)geom_area(aes(ref_date,value/1000,fill=type_of_electricity_generation),color="black",position = "stack",size=0.1,alpha=.8)+
geom_line(data=power_data %>%filter(geo=="Canada",hierarchy_for_type_of_electricity_generation=="1",
=="1"),
hierarchy_for_class_of_electricity_produceraes(ref_date,value/1000,colour="Total Generation"),position = "stack",size=1.1)+
power_graph()+ #graph format function
scale_fill_viridis("",option="cividis",discrete = T)+
scale_color_manual("",values="black")+
scale_x_date(name=NULL,date_breaks = "2 years", date_labels = "%Y",expand=c(0,0)) +
scale_y_continuous(expand = c(0, 0))+
labs(y="Monthly Generation (MWh)",x="Date",
title=paste("Canada Electricity Generation by Source",sep=""),
subtitle=paste("A Demonstration of the Value of Statistics Canada Data",sep=""),
caption="Source: Statistics Canada, Table 25-10-0015-01, electric power generation, monthly generation by type of electricity, graph by Andrew Leach.")+
NULL
And the same thing for Alberta:
So, inadvertently, you’re going to end up learning a different lesson: the cansim
package for R is great, but StatCan data are often suppressed because of an overly-aggressive sensitivity to confidentiality. Even though we can graph, with publicly available data, hourly Alberta output by unit, we can’t graph monthly data by plant type for some reason using StatCan data.
This is left in here for your reference, and you can download the data if you like. Some of you may want to use StatCan data for your upcoming Econ366 ChartWeek charts, so I thought it was worth leaving in this document.
RMD File and HTML/PDF Preparation
As before, use the basic RMD file to complete this (and future) assignments, just rename it assignment 3. But, remember to clean up your HTML for deliverable 1!