1.5 — Optimize Workflow

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com

Your Workflow Has a Lot of Moving Parts

  1. Writing text/documents

  2. Managing citations and bibliographies

  3. Performing data analysis

  4. Making figures and tables

  5. Saving files for future use

  6. Monitoring changes in documents

  7. Collaborating and sharing with others

  8. Combining into a deliverable (report, paper, presentation, etc.)

The Office Model

The Office Model I

  1. Writing text/documents

  2. Managing citations and bibliographies

  3. Performing data analysis

  4. Making figures and tables

  5. Saving files for future use

  6. Monitoring changes in documents

  7. Collaborating and sharing with others

  8. Combining into a deliverable (report, paper, presentation, etc.)

The Office Model II

  • A lot of copy/paste

  • A lot of:

The Office Model: A Short Horror Film

The Office Model: Mistakes

Source: Bloomberg

The Office Model: Not Reproducible

Drawing the Rest of the Owl

What I’ll Show You

  • This is how I make my…

    • Research papers
    • Course documents
    • Websites
    • Slides and presentations
  • I have not used any MS Office products since 2011 (good riddance!)

  • This stuff is optional

    • If you like your office model, you can keep it
    • But this is what most people who take this course continue to use (R is only really if you have data work)

The Plain Text Model

The Plain Text Model II

Meet Quarto, which can do all of this in one pipeline

  1. Writing text/documents

  2. Managing citations and bibliographies

  3. Performing data analysis

  4. Making figures and tables

  5. Saving files for future use

  6. Monitoring changes in documents

  7. Collaborating and sharing with others

  8. Combining into a deliverable (report, paper, presentation, etc.)

The Plain Text Model II

  • Plain text files: readable by both machines and humans
    • Understand how a document is structured and formatted via code and markup to text
  • Focus entirely on the actual writing of the content instead of the formatting and aesthetics
    • You can still customize, but with precise commands instead of point, click, drag, guess, pray

The Plain Text Model III

  • Open Source: free, useable forever, often very small file size

    • Proprietary software is a gamble - can you still open a .doc file from Microsoft Word 1997?
  • Automate and Minimize Errors, especially in repetitive processes

  • Can be used with version control (see below)

Making Your Work Reproducible

  • Quarto file (.qmd) is the “real” part of your analysis, everything can live in this plain-text file!

  • Document text in markdown

  • R code executed in “chunks”

  • Plots and tables generated from R code

  • Citations and bibliography automated with .bib file

The Future of Science is Open Source Plain Text

Quarto

Creating a Quarto Document I

File -> New File -> Quarto Document...

  • Outputs:
    • Document (what you’ll use for most things)
    • Presentation (for making slides in various formats)
    • Interactive (an html and R based web app, advanced)

Creating a Quarto Document I

  • html: renders a webpage, viewable in any browser
    • default, easiest to produce and share
    • can have interactive elements (gifs, animations, web apps)
    • requires internet connection to host and share (you can view offline)
  • pdf: renders a PDF document
    • most common document format around
    • requires LaTeX distribution to render (more on that soon)
  • word: create a Micosoft Word document
    • …if you must

Structure of a Quarto Document

Entire document is written in a single file with three types of content:

  1. YAML header for metadata
---
title: "Title"
format: html
---
  1. Text of the document written with markdown
# Header 1
**Bold** and *italic* text. 
  1. R chunks for data analysis, plots, figures, tables, statistics, as necessary
2+2
[1] 4

YAML Header I

  • Top of a document contains the YAML1 separated by three dashes --- above and below

  • Contains the metadata of the document, such as:

---
title: "My Document"
author: "Ryan Safner"
date: "`r Sys.Date()`" # here I'm using R code to generate today's date!
format: html
---
  • format must be specified, everything else can be left blank, and other options can be added as necessary

  • In most cases, you can safely ignore other things in the yaml until you are ready

YAML Header Example I

  • Example from these slides
---
format:
  revealjs:
    theme: [default, custom.scss]
    logo: "../images/metrics_hex.png"
    footer: "[ECON 480 — Econometrics](https://metricsF22.classes.ryansafner.com)"
    height: 900
    width: 1600
    #df-print: paged
    slide-number: c
overview: true
execute:
  echo: false
  warning: false
  freeze: auto
---

YAML Header Example II

  • Example from one of my papers:
---
title: Distributing Patronage^[I would like to thank the Board of Associates of Hood College...]
subtitle: Intellectual Property in the Transition from Natural State to Open Access Order
date: \today
author: 
- Ryan Safner^[Hood College, Department of Economics and Business Administration; safner@hood.edu]

abstract: |
  | "This paper explores the emergence of the modern forms of copyright and patent in ...
  | *JEL Classification:* O30, O43, N43
  | *Keywords:* Copyright, intellectual property, economic history, freedom of the press, economic development

bibliography: patronage.bib
geometry: margin = 1in
fontsize: 12pt
mainfont: Fira Sans Condensed
output: 
  pdf_document:
    latex_engine: xelatex
    number_sections: true
    fig_caption: yes

header-includes:
    - \usepackage{booktabs}
---

R Chunks

  • You can create a “chunk” of R code with three backticks1 above and below your code
  • After the first pair of backticks, signify the language of the code2 inside braces, e.g:

Input

```{r}
2+2 # code goes here!
```

Output

[1] 4

R Chunks

Input

```{r}
gapminder %>%
  head()
```

Output

# A tibble: 6 × 6
  country     continent  year lifeExp      pop gdpPercap
  <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
1 Afghanistan Asia       1952    28.8  8425333      779.
2 Afghanistan Asia       1957    30.3  9240934      821.
3 Afghanistan Asia       1962    32.0 10267083      853.
4 Afghanistan Asia       1967    34.0 11537966      836.
5 Afghanistan Asia       1972    36.1 13079460      740.
6 Afghanistan Asia       1977    38.4 14880372      786.

R Chunks

Input

```{r}
library("ggplot2") # load ggplot2
ggplot(data = mpg)+
  aes(x = displ,
      y = hwy)+
  geom_point(aes(color = class))
```

Output

R Chunk Options

  • Chunks can have options with the “hash pipe” #| at the top of the chunk
#| label: my_chunk_title # give chunk a name
#| eval: true # run the code?
#| echo: true # display code?
#| warning: true # display warnings?
#| message: false # display messages?
#| fig.width: 6 # width for figures
  • In R Markdown (the predecessor to Quarto…that I know better), you put options inside the braces at the top of a chunk. This is still valid in Quarto:
```{r my_chunk_title, eval = T, echo = F, warning = F, message = F, fig.width = 6}
```

Global Chunk Options

  • You can set default options for all chunks in the YAML header:
execute:
  echo: false # hide all input code
  warning: false # hide all output warnings
  message: false # hide all output messages

R Inline Code

  • If you just want to display some code (or at least format it like code) in the middle of a sentence, place between a single backtick on either side.
    • e.g. if I mention `tidyverse` or `gapminder`, it formats the text as in-line code.
  • To actually execute R code to output something in the middle of a sentence, put r as the first character inside the backticks, and then run the actual code such as pi is equal to 3.1415927.

Input

pi is equal to `r pi`.

Output

pi is equal to 3.1415927.

Or Like This

Input

The average GDP per capita is ` r dollar(mean(gapminder$gdpPercap)) ` with a standard deviation of ` r dollar(sd(gapminder$gdpPercap)) `.

Output

The average GDP per capita is $7,215.33 with a standard deviation of $9,857.45.

Writing Text with Markdown: Formatting

  • Markdown is a lightweight markup language geared towards HTML (i.e. the internet)
Markdown Syntax Output
*italics* and **bold**
italics and bold
superscript^2^ / subscript~2~
superscript2 / subscript2
~~strikethrough~~
strikethrough
`verbatim code`
verbatim code
  • Comment your document with <!-- Unprinted comments here --> (will not print in output; this comes from html)

Writing Text with Markdown: Lists

Markdown Syntax Output
* unordered list
    + sub-item 1
    + sub-item 2
        - sub-sub-item 1
  • unordered list

    • sub-item 1

    • sub-item 2

      • sub-sub-item 1
*   item 2

    Continued (indent 4 spaces)
  • item 2

    Continued (indent 4 spaces)

1. ordered list
2. item 2
    i) sub-item 1
         A.  sub-sub-item 1
  1. ordered list

  2. item 2

    1. sub-item 1

      1. sub-sub-item 1

Writing Text with Markdown: Headings

Markdown Syntax Output
# Header 1

Header 1

## Header 2

Header 2

### Header 3

Header 3

#### Header 4

Header 4

##### Header 5
Header 5
###### Header 6
Header 6

Writing Text with Markdown: Images

Markdown

![Caption: tidyverse](images/tidyverse1.png)

![](https://dplyr.tidyverse.org/logo.png)

Output

The tidyverse

dplyr logo

Writing Text with Markdown: Making Tables

Markdown

| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
|   12  |  12  |    12   |    12  |
|  123  |  123 |   123   |   123  |
|    1  |    1 |     1   |     1  |

: Table Example {tbl-colwidths="[25,25,25,25]"}

Output

Table Example
Right Left Default Center
12 12 12 12
123 123 123 123
1 1 1 1
  • See the Quarto Documentation for more help on tables

Writing Text with Markdown: Printing Tables

  • Sometimes we want to print tables from our data
  • The kableExtra package is great for this see Documentation
library(kableExtra)
mtcars %>%
  head() %>%
  kbl()
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Writing Text with Markdown: Printing Tables

  • Sometimes we want to print tables from our data
  • The kableExtra package is great for this see Documentation
library(kableExtra)
mtcars %>%
  head() %>%
  kbl()
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Writing Text with Markdown: Printing Tables

mtcars %>%
  head() %>%
  rmarkdown::paged_table()

Writing Math

  • Add beautifully-formatted math with the $ tag before and after the math, two $$ before/after for a centered equation
  • In-line math example: $1^2=\frac{\sqrt{16}}{4}$ produces \(1^2=\frac{\sqrt{16}}{4}\) in my text
  • Centered-equation example:

Input

$$ \hat{\beta_1}=\frac{\displaystyle \sum_{i=1}^n (X_i-\bar{X})(Y_i-\bar{Y})}{\displaystyle \sum_{i=1}^n (X_i-\bar{X})^2} $$

Output

\[\hat{\beta_1}=\frac{\displaystyle \sum_{i=1}^n (X_i-\bar{X})(Y_i-\bar{Y})}{\displaystyle \sum_{i=1}^n (X_i-\bar{X})^2}\]

Writing Math

  • Math uses a (much older) language called \(\LaTeX\), used by mathematicians, economists, and others to write papers and slides with perfect math and formatting
    • I used to use for everything before I found R and markdown
    • Producing pdf output actually converts markdown files into \(\TeX{}\) first!
    • Much steeper learning curve, a good cheatsheet
    • An extensive library of mathematical symbols, notation, formats, and ligatures, e.g.
  • A great resource: Wikibooks LaTeX Mathematics chapter

Writing Math

Input Output
$\alpha$ \(\alpha\)
$\pi$ \(\pi\)
$\frac{1}{2}$ \(\frac{1}{2}\)
$\hat{x}$ \(\hat{x}\)
$\bar{y}$ \(\bar{y}\)
$x_{1,2}$ \(x_{1,2}\)
x^{a-1}$ \(x^{a-1}\)
$\lim_{x \to \infty}$ \(\lim_{x \to \infty}\)
$A=\begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \\ \end{bmatrix}$ \(A=\begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \\ \end{bmatrix}\)

Citations, References, & Bibliography

  • Manage your citations and bibliography automatically with .bib files
  • First create a .bib file to list all of your references in
    • You can do this in R via: File -> New File -> Text File (and save with .bib at the end)
    • See examplebib.bib in this repository used in this document
    • At the top of your YAML header in the main document, add bibliography: examplebib.bib so R knows to pull references from this file
    • For each reference, add information to a .bib file, like so:

An Example .bib File

@article{safner2016,
  author = {Ryan Safner},
  year = {2016},
  journal = {Journal of Institutional Economics},
  title = {Institutional Entrepreneurship, Wikipedia, 
           and the Opportunity of the Commons},
  volume = {12},
  number = {4},
  pages = {743-771}
}
  • A .bib file is a plain text file with entries like this

  • Classes for @article, @book, @collectedwork, @unpublished, etc.

    • Each will have different keys needed (e.g. editor, publisher, address)
  • First input after the @article is your citation key (e.g. safner2016)

    • Whenever you want to cite this article, you’ll invoke this key

Citations

  • Whenever you want to cite a work in your text, call up the citation key with @, like so: @safner2016[], which produces (Safner, 2016)

  • You can customize citations, e.g.:

Write Produces
[@Safner2016] (Safner, 2016)
@Safner2016 Safner 2016
-@Safner2016 (2016)
@Safner2016[p. 743-744] (Safner, 2016, p.743-744)
  • BibTeX will automatically collect all works cited at the end and produce a bibliography according to a style you can choose

Reference Management Software

  • For more information and examples, see Quarto’s Documentation on Citations

  • Lot of programs can help you manage references and export complete .bib files to use with R Markdown

    • Mendeley and Zotero are free and cross-platform
    • I use Papers (Paid and Mac only)
    • Simplest program (what I use) that makes .bib files is Bibdesk

Plain-Text Editors

  • Markdown files are plain text files and can be edited in any text editor

  • Any good editor will have syntax highlighting and coloring when you use tags (like bold, italic, code, and code #comments).

  • VS Code; Notepad++; Sublime

VS Code

RStudio is My Text Editor of Choice

  • Honestly, I write everything in R Studio’s text editor
    • Syntax highlighting
    • Actually can run R code, autocomplete, etc
    • Can render the markdown to an output format: html, pdf, etc.
  • You can write R code in other text editors, but you can’t execute them outside of R Studio (or the command line, but that’s too advanced.) Same with actually rendering your markdown to an output (pdf, html, etc)

Tips with Markdown

  • Empty space is very important in markdown

  • Lines that begin with a space may not render properly

  • Math that contains spaces between the dollar-signs may not render properly

  • Moving from one type of content to another (e.g. a heading to a list to text to an equation to text) requires blank lines between them to work

  • Here is a great general tutorial on markdown syntax

Rendering Your Documents

knitr

  • When you are ready, you “redner” your markdown and code into an output format using:

  • knitr1, an R package that “knits” your R code and markdown .qmd into a .md file for:

  • pandoc is a “swiss-army knife” utility that can convert between dozens of document types

  • All you need to do is click the Render button at the top of the text editor!

PDF Output

  • Producing a PDF uses \(\LaTeX{}\)

  • You will need a full distribution of \(\LaTeX{}\) on your computer, OR

  • Better to use the package tinytex to install a mini-distro of \(\LaTeX{}\) inside of R:
# install.packages("tinytex") # first install package
library(tinytex) # load package
install_tinytex() # run this command to install LaTeX in R
  • Once you’ve done this (just one time), you can Render to a PDF, make sure your YAML header is set to pdf format:
---
format: pdf
---

Project-Oriented Workflow

R Projects I

  • A R Project is a way of systematically organizing your R history, working directory, and related files in a single, self-contained directory
  • Can easily be sent to others who can reproduce your work easily
  • Connects well with version control software like GitHub
  • Can open multiple projects in multiple windows

R Projects I

  • Projects solve all of the following problems:
    1. Organizing your files (data, plots, text, citations, etc)
    2. Having an accessible working directory (for loading and saving data, plots, etc)
    3. Saving and reloading your commands history and preferences
    4. Sending files to collaborators, so they have the same working directory as you

Creating an R Project I

Creating an R Project II

Creating an R Project III

Projects

  • Switch between each project (Window) on your computer (this is on a Mac)

Projects

  • At top right corner of RStudio
    • Click the button to the right of the name to open in a new window!

Loading Others’ Projects

  • This project is on GitHub, click the green button, download to your computer, open .Rproj file in R Studio

  • See my guide about unzipping files (especially for Windows)!

A Good File Structure

  • Look through this on your own
  • Read the README of this repository on GitHub for instructions (automatically shows on the main page)
  • Look at the example_paper.qmd
    • Uses data from data folder
    • Uses .R scripts from scripts folder
    • Uses figures from figures folder
    • Uses bibexample.bib from bibliography folder

Version Control

Have You Done This?

Source: PhD Comics

Have You Done This?

Source: PhD Comics

Have You Done This?

Source: PhD Comics

Do You Want to Be Able to

  • Keep your files backed up

  • Track changes

  • Collaborate on the same files with others

  • Edit files on one computer and then open and continue working on another?

The Training-Wheels Version

  • Register an account for free

  • Set up a location on your computer for the Dropbox/ folder

  • Anything you put in this folder will sync to the cloud

    • As soon as you change files, they automatically update and sync!
    • Can download any of these flies from the website on any device
    • Set this up on multiple computers so when you change a file on one, it updates on all the others!

My Life Goes In Here

Smart Sync

Smart Sync - keep some files online only for space

The Expert Version

  • Git is an “open source distributed version control system” widely used in the software development industry

  • Track changes on steroids (if MS Word’s Track Changes and Dropbox had a baby)

    • Organize folders/files to track (a "repository")
    • Take a snapshot of all of your files (a “commit”) with “comments”
    • push these to the cloud
    • pull changes to (other) computers as needed
  • GitHub is a popular (not the only!) cloud destination for these repositories

The Expert Version

  • Shows history (versions) of files with comments
    • Can fork or branch repository into multiple versions at once
    • Good for “testing” things out without destroying old versions!
    • revert back to original versions as needed

The Expert Version

The Expert Version

  • Requires some advanced set up, see this excellent guide

  • R Studio integrates git and github commands nicely

This Class on Github

github.com/ryansafner/metricsF22

Most Packages Start on Github

My Workflow (That I Suggest to You)

  1. Create a new repository on Github.1
  2. Start a New R Project in R Studio (link it to the github repository - see guide)
  3. Create a logical file system (see example), such as:
project # folder on my computer (the new working directory)
|
|- data/ # folder for data files 
|- scripts/ # folder .R code
|- bibliography/ # folder for .bib files
|- figures/ # folder to plots and figures to
|- paper.qmd # write document here
  1. Write document in paper.qmd, loading/saving files from/to various folders in project
    • e.g. load data like df <- read_csv("data/my_data"); save plots like ggsave("figures/p.png")
  2. Render document to pdf or html.
  3. Occasionally, stage and commit changes with a description, push to GitHub.

Resources