2.2 — Random Variables & Distributions

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com

Contents

Random Variables

Discrete Random Variables

Expected Value and Variance

Continuous Random Variables

The Normal Distribution

Random Variables

Experiments

  • An experiment is any procedure that can (in principle) be repeated infinitely and has a well-defined set of outcomes

Example

Flip a coin 10 times.

Random Variables

  • A random variable (RV) takes on values that are unknown in advance, but determined by an experiment

  • A numerical summary of a random outcome

Example

The number of heads from 10 coin flips

Random Variables: Notation

  • Random variable X takes on individual values (xi) from a set of possible values

  • Often capital letters to denote RV’s

    • lowercase letters for individual values

Example

Let X be the number of Heads from 10 coin flips. xi∈{0,1,2,...,10}

Discrete Random Variables

Discrete Random Variables

  • A discrete random variable: takes on a finite/countable set of possible values

Example

Let X be the number of times your computer crashes this semester1, xi∈{0,1,2,3,4}

  1. Please, back up your files!

Discrete Random Variables: Probability Distribution

  • Probability distribution of a R.V. fully lists all the possible values of X and their associated probabilities
xi P(X=xi)
0 0.80
1 0.10
2 0.06
3 0.03
4 0.01

Discrete Random Variables: pdf

  • Probability distribution function (pdf) summarizes the possible outcomes of X and their probabilities

  • Notation: fX is the pdf of X:

fX=pi,i=1,2,...,k

  • For any real number xi, f(xi) is the probablity that X=xi
xi P(X=xi)
0 0.80
1 0.10
2 0.06
3 0.03
4 0.01
  • What is f(0)?
  • What is f(3)?

Discrete Random Variables: pdf Graph

  • Plot
  • Code

crashes<-tibble(number = c(0,1,2,3,4),
                prob = c(0.80, 0.10, 0.06, 0.03, 0.01))

ggplot(data = crashes) +
  aes(x = number,
      y = prob)+
  geom_col(fill = "#e64173") +
  labs(x = "Number of Crashes",
       y = "Probability") +
  scale_y_continuous(breaks = seq(0,1,0.2),
                     limits = c(0,1),
                     expand = c(0,0))+
  theme_classic(base_family = "Fira Sans Condensed",
                base_size = 20)

Discrete Random Variables: cdf

  • Cumulative distribution function (cdf) lists probability X will be at most (less than or equal to) a given value xi

  • Notation: FX=P(X≤xi)

xi f(x) F(x)
0 0.80 0.80
1 0.10 0.90
2 0.06 0.96
3 0.03 0.99
4 0.01 1.00
  • What is the probability your computer will crash at most once, F(1)?
  • What about three times, F(3)?

Discrete Random Variables: cdf Graph

crashes <- crashes %>%
  mutate(cum_prob = cumsum(prob))

crashes
# A tibble: 5 × 3
  number  prob cum_prob
   <dbl> <dbl>    <dbl>
1      0  0.8      0.8 
2      1  0.1      0.9 
3      2  0.06     0.96
4      3  0.03     0.99
5      4  0.01     1   

Discrete Random Variables: cdf Graph

  • Plot
  • Code

ggplot(data = crashes) +
  aes(x = number,
      y = cum_prob) +
  geom_col(fill="#e64173") +
  labs(x = "Number of Crashes",
       y = "Probability") +
  scale_y_continuous(breaks = seq(0,1,0.2),
                     limits = c(0,1),
                     expand = c(0,0)) +
  theme_classic(base_family = "Fira Sans Condensed",
                base_size = 20)

Expected Value and Variance

Expected Value of a Random Variable

  • Expected value of a random variable X, written E(X) (and sometimes μ), is the long-run average value of X “expected” after many repetitions

E(X)=k∑i=1pixi

  • E(X)=p1x1+p2x2+⋯+pkxk

  • A probability-weighted average of X, with each xi weighted by its associated probability pi

  • Also called the “mean” or “expectation” of X, always denoted either E(X) or μX

Expected Value: Example I

Example

Suppose you lend your friend $100 at 10% interest. If the loan is repaid, you receive $110. You estimate that your friend is 99% likely to repay, but there is a default risk of 1% where you get nothing. What is the expected value of repayment?

Expected Value: Example II

Example

Let X be a random variable that is described by the following pdf:

xi P(X=xi)
1 0.50
2 0.25
3 0.15
4 0.10

Calculate E(X).

The Steps to Calculate E(X), Coded

# Make a Random Variable called X
X <- tibble(x_i = c(1,2,3,4), # values of X
            p_i = c(0.50,0.25,0.15,0.10)) # probabilities


# Look at tibble
X
# A tibble: 4 × 2
    x_i   p_i
  <dbl> <dbl>
1     1  0.5 
2     2  0.25
3     3  0.15
4     4  0.1 


# Get expected value
X %>%
  summarize(expected_value = sum(x_i * p_i))
# A tibble: 1 × 1
  expected_value
           <dbl>
1           1.85

Variance of a Random Variable

  • The variance of a random variable X, denoted var(X) or σ2X is:

σ2X=E[(xi−μX)2]=n∑i=1(xi−μX)2pi

  • This is the expected value of the squared deviations from the mean
    • i.e. the probability-weighted average of the squared deviations

Standard Deviation of a Random Variable

  • The standard deviation of a random variable X, denoted sd(X) or σX is:

σX=√σ2X

  • This is the average or expected deviation from the mean

Standard Deviation: Example I

Example

What is the standard deviation of computer crashes?

xi P(X=xi)
0 0.80
1 0.10
2 0.06
3 0.03
4 0.01

The Steps to Calculate sd(X), Coded I

# get the expected value 
crashes %>%
  summarize(expected_value = sum(number*prob))
# A tibble: 1 × 1
  expected_value
           <dbl>
1           0.35


# save this for quick use
exp_value <- 0.35


crashes_2 <- crashes %>%
  select(-cum_prob) %>% # we don't need the cdf
  # create new columns
  mutate(deviations = number - exp_value, # deviations from exp_value
         deviations_sq = deviations^2, # square deviations
         weighted_devs_sq = prob * deviations_sq) # weight squared deviations by probability

The Steps to Calculate sd(X), Coded II

# look at what we made
crashes_2
# A tibble: 5 × 5
  number  prob deviations deviations_sq weighted_devs_sq
   <dbl> <dbl>      <dbl>         <dbl>            <dbl>
1      0  0.8       -0.35         0.122           0.098 
2      1  0.1        0.65         0.423           0.0423
3      2  0.06       1.65         2.72            0.163 
4      3  0.03       2.65         7.02            0.211 
5      4  0.01       3.65        13.3             0.133 

The Steps to Calculate sd(X), Coded III

# now we want to take the expected value of the squared deviations to get variance
crashes_2 %>%
  summarize(variance = sum(weighted_devs_sq), # variance
            sd = sqrt(variance)) # sd is square root of variance
# A tibble: 1 × 2
  variance    sd
     <dbl> <dbl>
1    0.648 0.805

Standard Deviation: Example II

Example

What is the standard deviation of the random variable we saw before?

xi P(X=xi)
1 0.50
2 0.25
3 0.15
4 0.10

Hint: you already found it’s expected value.

Continuous Random Variables

Continuous Random Variables

  • Continuous random variables can take on an uncountable (infinite) number of values

  • So many values that the probability of any specific value is infinitely small:

P(X=xi)→0

  • Instead, we focus on a range of values it might take on

Continuous Random Variables: pdf I

  • Probability density function (pdf) of a continuous variable represents the probability between two values as the area under a curve

  • The total area under the curve is 1

  • Since P(a)=0 and P(b)=0, P(a<X<b)=P(a≤X≤b)

  • See today’s appendix for how to graph math/stats functions in ggplot!

Example

P(0≤X≤2)

Continuous Random Variables: pdf II

  • FYI using calculus:

P(a≤X≤b)=∫baf(x)dx

  • Complicated: software or (old fashioned!) probability tables to calculate

Example

P(0≤X≤2)

Continuous Random Variables: cdf I

  • The cumulative density function (cdf) describes the area under the pdf for all values less than or equal to (i.e. to the left of) a given value, k

P(X≤k)

Example

P(X≤2)

Continuous Random Variables: cdf II

  • Note: to find probability of values greater than or equal to (to the right of) a given value k:

P(X≥k)=1−P(X≤k)

Example

P(X≥2)=1−P(X≤2)

P(X≥2)= area under the pdf curve to the right of 2

The Normal Distribution

The Normal Distribution

  • The Gaussian or normal distribution is the most useful type of probability distribution

X∼N(μ,σ)

  • “X is distributed Normally with mean μ and standard deviation σ”

  • Continuous, symmetric, unimodal

The Normal Distribution: pdf

  • FYI: The pdf of X∼N(μ,σ) is

P(X=k)=1√2πσ2e−12((k−μ)σ)2

  • Do not try and learn this, we have software and (previously tables) to calculate pdfs and cdfs

The Standard Normal Distribution

  • The standard normal distribution (often referred to as Z) has mean 0 and standard deviation 1

Z∼N(0,1)

The Standard Normal cdf

  • The standard normal cdf, often referred to as Φ:

Φ(k)=P(Z≤k)

(again, the area under the pdf curve to the left of some value k)

The 68-95-99.7 Empirical Rule

  • 68-95-99.7% empirical rule: for a normal distribution:

The 68-95-99.7 Empirical Rule

  • 68-95-99.7% empirical rule: for a normal distribution:

  • P(μ−1σ≤X≤μ+1σ)≈ 68%

The 68-95-99.7 Empirical Rule

  • 68-95-99.7% empirical rule: for a normal distribution:

  • P(μ−1σ≤X≤μ+1σ)≈ 68%

  • P(μ−2σ≤X≤μ+2σ)≈ 95%

The 68-95-99.7 Empirical Rule

  • 68-95-99.7% empirical rule: for a normal distribution:

  • P(μ−1σ≤X≤μ+1σ)≈ 68%

  • P(μ−2σ≤X≤μ+2σ)≈ 95%

  • P(μ−3σ≤X≤μ+3σ)≈ 99.7%

  • 68/95/99.7% of observations fall within 1/2/3 standard deviations of the mean

Standardizing Normal Distributions

  • We can take any normal distribution (for any μ,σ) and standardize it to the standard normal distribution by taking the Z-score of any value, xi:

Z=xi−μσ

  • Subtract any value by the distribution’s mean and divide by standard deviation

  • Z: number of standard deviations xi value is away from the mean

Standardizing Normal Distributions: Example I

Example

On August 8, 2011, the Dow dropped 634.8 points, sending shock waves through the financial community. Assume that during mid-2011 to mid-2012 the daily change for the Dow is normally distributed, with the mean daily change of 1.87 points and a standard deviation of 155.28 points. What is the Z-score?

Z=X−μσ

Z=634.8−1.87155.28

Z=−4.1

This is 4.1 standard deviations (σ) beneath the mean, an extremely low probability event.

Standardizing Normal Distributions: Example II

Example

In the last quarter of 2021, a group of 64 mutual funds had a mean return of 2.4% with a standard deviation of 5.6%. These returns can be approximated by a normal distribution.

What percent of the funds would you expect to be earning between -3.2% and 8.0% returns?

Convert to standard normal to find Z-scores for 8 and −3.2.

P(−3.2<X<8)

P(−3.2−2.45.6<X−2.45.6<8−2.45.6)

P(−1<Z<1)

P(X±1σ)=0.68

Standardizing Normal Distributions: Example II

Standardizing Normal Distributions: Example III

Example

In the last quarter of 2021, a group of 64 mutual funds had a mean return of 2.4% with a standard deviation of 5.6%. These returns can be approximated by a normal distribution.

  1. What percent of the funds would you expect to be earning 2.4% or less?

  2. What percent of the funds would you expect to be earning between -8.8% and 13.6%?

  3. What percent of the funds would you expect to be earning returns greater than 13.6%?

How do we actually find the probabilities for Z−scores?

Finding Z-score Probabilities I

Probability to the left of zi

P(Z≤zi)=Φ(zi)⏟cdf of zi

Probability to the right of zi

P(Z≥zi)=1−Φ(zi)⏟cdf of zi

Finding Z-score Probabilities II

Probability between z1 and z2

P(z1≥Z≥z2)=Φ(z2)⏟cdf of z2−Φ(z1)⏟cdf of z1

Finding Z-score Probabilities III

  • pnorm() calculates probabilities with a normal distribution with arguments:
    • x = the value
    • mean = the mean
    • sd = the standard deviation
    • lower.tail =
      • TRUE if looking at area to LEFT of value
      • FALSE if looking at area to RIGHT of value

Finding Z-score Probabilities IV

Example

Let the distribution of grades be normal, with mean 75 and standard deviation 10.

  • Probability a student gets at least an 80
pnorm(80, 
      mean = 75,
      sd = 10,
      lower.tail = FALSE) # looking to right
[1] 0.3085375

Finding Z-score Probabilities V

Example

Let the distribution of grades be normal, with mean 75 and standard deviation 10.

  • Probability a student gets at most an 80
pnorm(80, 
      mean = 75,
      sd = 10,
      lower.tail = TRUE) # looking to left
[1] 0.6914625

Finding Z-score Probabilities VI

Example

Let the distribution of grades be normal, with mean 75 and standard deviation 10.

  • Probability a student gets between 65 and 85
# subtract two left tails!
pnorm(85, # larger number first!
      mean = 75,
      sd = 10,
      lower.tail = TRUE) - # looking to left, & SUBTRACT
  pnorm(65, # smaller number second!
        mean = 75,
        sd = 10,
        lower.tail = TRUE) #looking to left
[1] 0.6826895

ECON 480 — Econometrics

1
2.2 — Random Variables & Distributions ECON 480 • Econometrics • Fall 2022 Dr. Ryan Safner Associate Professor of Economics safner@hood.edu ryansafner/metricsF22 metricsF22.classes.ryansafner.com

  1. Slides

  2. Tools

  3. Close
  • Title Slide
  • Contents
  • Random Variables
  • Experiments
  • Random Variables
  • Random Variables: Notation
  • Discrete Random Variables
  • Discrete Random Variables
  • Discrete Random Variables: Probability Distribution
  • Discrete Random Variables: pdf
  • Discrete Random Variables: pdf Graph
  • Discrete Random Variables: cdf
  • Discrete Random Variables: cdf Graph
  • Discrete Random Variables: cdf Graph
  • Expected Value and Variance
  • Expected Value of a Random Variable
  • Expected Value: Example I
  • Expected Value: Example II
  • The Steps to Calculate E(X), Coded
  • Variance of a Random Variable
  • Standard Deviation of a Random Variable
  • Standard Deviation: Example I
  • The Steps to Calculate sd(X), Coded I
  • The Steps to Calculate sd(X), Coded II
  • The Steps to Calculate sd(X), Coded III
  • Standard Deviation: Example II
  • Continuous Random Variables
  • Continuous Random Variables
  • Continuous Random Variables: pdf I
  • Continuous Random Variables: pdf II
  • Continuous Random Variables: cdf I
  • Continuous Random Variables: cdf II
  • The Normal Distribution
  • The Normal Distribution
  • The Normal Distribution: pdf
  • The Standard Normal Distribution
  • The Standard Normal cdf
  • The 68-95-99.7 Empirical Rule
  • The 68-95-99.7 Empirical Rule
  • The 68-95-99.7 Empirical Rule
  • The 68-95-99.7 Empirical Rule
  • Standardizing Normal Distributions
  • Standardizing Normal Distributions: Example I
  • Standardizing Normal Distributions: Example II
  • Standardizing Normal Distributions: Example II
  • Standardizing Normal Distributions: Example III
  • How do we actually find the probabilities for Z−scores?
  • Finding Z-score Probabilities I
  • Finding Z-score Probabilities II
  • Finding Z-score Probabilities III
  • Finding Z-score Probabilities IV
  • Finding Z-score Probabilities V
  • Finding Z-score Probabilities VI
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help