RMD vs Scripts

A few of you have talked to me about running R code in different ways, and some are confused by the difference between an R script and an R-Markdown document. Hopefully this example will help.

Suppose I want to make the graph from the Week 2 data exercise.

One option would be to create an R script - let’s call it week2.R - and I could run code that looks like this from that script:

library(readxl)
library(janitor)
library(tidyverse)
library(lubridate)
library(scales)
library(viridis)

#set the working directory to wherever this script is located
setwd(dirname(rstudioapi::getSourceEditorContext()$path))

download.file("https://www.eia.gov/dnav/pet/xls/PET_PRI_SPT_S1_D.xls",destfile = "eia_prices.xls",mode="wb")
#reshape and prep the data
price_data<- 
  read_excel("eia_prices.xls" ,sheet = "Data 1",skip=2)%>%clean_names()%>%
  rename(wti=2,brent=3)%>%
  pivot_longer(-date,names_to = "crude_stream", values_to = "price")%>%
  filter(!is.na(price))%>%
  mutate(crude_stream=gsub("brent","Brent",crude_stream))%>%
  mutate(crude_stream=gsub("wti","WTI",crude_stream))%>%
  filter(date>=max(date)-years(15)) #keep the last 15 years of data
#make the graph by passing the data to ggplot
  ggplot(price_data)+ #make a graph
  geom_line(aes(date,price,group=crude_stream,color=crude_stream),linewidth=.65)+
  scale_x_datetime(date_labels = "%Y",date_breaks = "1 year", expand=c(0,0))+
  scale_y_continuous(breaks=pretty_breaks(), expand=c(0,0))+
  expand_limits(y=c(0,150))+
  expand_limits(x=as.POSIXct.Date(ymd("2023-12-31")))+
  scale_color_viridis("",discrete = T,option="A",direction = -1,begin=.7,end = 0)+
  theme_minimal()+
  theme(legend.position = "bottom")+
  guides(linetype = guide_legend(keywidth = unit(1.6,"cm"),nrow = 1))+
  labs(y="Price ($/bbl)",x="",
       title="Crude Oil Spot Prices",
       subtitle = "Daily West Texas Intermediate (WTI) and Brent prices in dollars per barrel (2008-2023)",
       caption = "Data via EIA, graph by Andrew Leach"
  )+
NULL #I often put these at the end of my graph code so I don't have to worry about the "+" at the end of an added line
# save the graph to a file
ggsave("oil_graph.png",width=14,height=7,dpi=250,bg="white")

And, that script does four things:

loads packages that are needed into R on my local machine
makes sure the working directory (where R will look for or save files by default) is set to the same directory as the script is stored in
downloads the data
reads and reshapes the data to make it ready to graph
makes the graph
saves the graph to a .png file with specific size, dpi, and background attributes.

If you check your working directory (if you don’t know what that is, go to the R console and type getwd()), you’d find an image file, oil_graph.png, that looks like this:

I could then, if I wanted to do so, use that image in a file (Word, R Markdown, etc.).

You’ll also notice that, after you run that code, you’ll have price_data in your R Environment (i.e. in memory on your machine):

Running a line (or multiple lines) of code from a script is the same as typing them into the console (or command) line. For example, since I have price_data in memory already, I could type tail(price_data,10) into the console and I’ll get this:

Want to know something fun about r console? Try this: type print(D(expression(x^2 + 5*x + 1),'x')), and you’ll be perhaps a bit surprised to find the output is 2 * x + 5.

The second option you can use to to create the graph in a document (and the one I want you to use) would be to use the same code, but to put it in a code chunk in an R-Markdown (RMD) file like week2.Rmd:

The code chunk would still create the graph, but the graph would also become part of an html document. E.g. once you knit the R Markdown file, the html document created will look like this with the graph embedded:

But, if you started from a new R session and knit that R markdown document, you’d see that none of the data or packages would stay loaded in memory on your machine. Nothing would be listed in your Environment and, if you check your Packages window, none of the packages you loaded (library(readxl),library(janitor),library(tidyverse),library(lubridate), library(scales) and library(viridis)) are checked off.

THIS IS THE MOST IMPORTANT THING TO REMEMBER THIS TERM!!!

That’s because knitting the R markdown document:

starts a new R session;
loads all the packages listed in the RMD;
does all the work;
saves any output to html;
closes the R session.

THIS IS THE MOST IMPORTANT THING TO REMEMBER THIS TERM!!!

Now, you can use your R-Markdown document as a script but not the other way around. You can execute one or more code chunks using the icons in the top right of your code chunk in your R-Markdown document. That has the same effect as either running those lines of code in a script or typing them in the console window. It reads things into memory, and will create output.

You can also highlight (select with cursor or mouse) lines of code within a code chunk in a markdown document or in a script, and you can run them either by a mouse click or with CRTL-Enter.

DO NOT TRY TO OUTPUT YOUR RMD TO PDF

Finally, RMD files can be configured to output to Word, PDF, or directly to HTML. We will use HTML output throughout the term. If you want to make a PDF from your file, you can do what I do for the course slides: install a package called pagedown and then use the command pagedown::chrome_print(“week2.Rmd”,wait=5), and it will output the file to html and pdf. But, for my assignments, you need only submit the HTML.

DO NOT TRY TO OUTPUT YOUR RMD TO PDF