2.7 — Hypothesis Testing (Regression)

ECON 480 • Econometrics • Fall 2022

Dr. Ryan Safner
Associate Professor of Economics

safner@hood.edu
ryansafner/metricsF22
metricsF22.classes.ryansafner.com

Contents

Hypothesis Testing

Digression: p-Values and the Philosophy of Science

Hypothesis Testing by Simulation with infer

Theory-Based Hypothesis Testing (What R Calculates)

The Use and Abuse of p-Values

Hypothesis Testing

Estimation and Hypothesis Testing I

  • We want to test if our estimates are statistically significant and they describe the population
    • this is the “bread and butter” of using inferential statistics

Examples

  • Does reducing class size improve test scores?
  • Do more years of education increase your wages?
  • Is the gender wage gap between men and women 23%?

All modern science is built upon statistical hypothesis testing, so understand it well

Estimation and Hypothesis Testing II

  • Note, we can test a lot of hypotheses about a lot of population parameters, e.g.
    • A population mean μ
      • Example: average height of adults
    • A population proportion p
      • Example: percent of voters who voted for Biden
    • A difference in population means μA−μB
      • Example: difference in average wages of men vs. women
    • A difference in population proportions pA−pB
      • Example: difference in percent of patients reporting symptoms of drug A vs B
  • We will focus on hypotheses about population regression slope (β1), i.e. the causal effect1 of X on Y
  1. With a model this simple, it’s almost certainly not causal, but this is the ultimate direction we are heading…

Null and Alternative Hypotheses I

  • All scientific inquiries begin with a null hypothesis (H0) that proposes a specific value of a population parameter
    • Notation: add a subscript 0: β1,0 (or μ0, p0, etc)
  • We suggest an alternative hypothesis (Ha), often the one we hope to verify
    • Note, can be multiple alternative hypotheses: H1,H2,…,Hn
  • Ask: “Does our data (sample) give us sufficient evidence to reject H0 in favor of Ha?”
    • Note: the test is always about H0!
    • See if we have sufficient evidence to reject the status quo

Null and Alternative Hypotheses II

  • Null hypothesis assigns a value (or a range) to a population parameter
    • e.g. β1=2 or β1≤20
    • Most common is β1=0 ⟹ X has no effect on Y (no slope for a line)
    • Note: always an equality!
  • Alternative hypothesis must mathematically contradict the null hypothesis
    • e.g. β1≠2 or β1>20 or β1≠0
    • Note: always an inequality!
  • Alternative hypotheses come in two forms:
    1. One-sided alternative: β1>H0 or β1<H0
    2. Two-sided alternative: β1≠H0
      • Note this means either β1<H0 or β1>H0

::: footer :::

Components of a Valid Hypothesis Test

  • All statistical hypothesis tests have the following components:
  1. A null hypothesis, H0
  1. An alternative hypothesis, Ha
  1. A test statistic to determine if we reject H0 when the statistic reaches a “critical value”
    • Beyond the critical value is the “rejection region”, sufficient evidence to reject H0
  1. A conclusion whether or not to reject H0 in favor of Ha

Type I and Type II Errors I

  • Sample statistic (^β1) will rarely be exactly equal to the hypothesized parameter (β1)

  • Difference between observed statistic and true parameter could be because:

  1. Parameter is not the hypothesized value
    • H0 is false
  1. Parameter truly is the hypothesized value, but sampling variability gave us a different estimate
    • H0 is true
  • We cannot distinguish between these two possibilities with any certainty

  • So, we can interpret our estimates probabilistically as committing one of two types of error

Type I and Type II Errors II

  1. Type I error (false positive): rejecting H0 when it is in fact true
    • Believing we found an important result when there is truly no relationship
  1. Type II error (false negative): failing to reject H0 when it is in fact false
    • Believing we found nothing when there was truly a relationship to find

Type I and Type II Errors III

  • Depending on context, committing one type of error may be more serious than the other

Type I and Type II Errors IV

  • Anglo-American common law presumes defendant is innocent: H0
  • Jury judges whether the evidence presented against the defendant is plausible assuming the defendant were in fact innocent
  • If highly improbable (beyond a “reasonable doubt”): sufficient evidence to reject H0 and convict

Type I and Type II Errors V

William Blackstone

(1723-1780)

“It is better that ten guilty persons escape than that one innocent suffer.”

  • Type I error is worse than a Type II error in law!

Blackstone, William, 1765-1770, Commentaries on the Laws of England

Type I and Type II Errors VI

Type I and Type II Errors VII

Significance Level, α, and Confidence Level 1−α

  • The significance level, α, is the probability of a Type I error

α=P(Reject H0|H0 is true)

  • The confidence level is defined as (1−α)
    • Specify in advance an α-level (0.10, 0.05, 0.01) with associated confidence level (90%, 95%, 99%)
  • The probability of a Type II error is defined as β:

β=P(Don't reject H0|H0 is false)

α and β

Power and p-values

  • The statistical power of the test is (1−β): the probability of correctly rejecting H0 when H0 is in fact false (e.g. convicting a guilty person)

Power=1−β=P(Reject H0|H0 is false)

  • The p-value or significance probability is the probability that, if the null hypothesis were true, the test statistic from any sample will be at least as extreme as the test statistic from our sample

p(δ≥δi|H0 is true)

  • where δ represents some test statistic
  • δi is the test statistic we observe in our sample
  • More on this in a bit

p-values and Statistical Significance

  • After running our test, we need to make a decision between the competing hypotheses

  • Compare p-value with pre-determined α (commonly, α=0.05, 95% confidence level)

  • If p<α: statistically significant evidence sufficient to reject H0 in favor of Ha

    • Note this does not mean Ha is true! We merely have rejected H0!
  • If p≥α: insufficient evidence to reject H0

    • Note this does not mean H0 is true! We merely have failed to reject H0!

Digression: p-Values and the Philosophy of Science

Hypothesis Testing and the Philosophy of Science I

Sir Ronald A. Fisher

(1890-1962)

“The null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.”

Fisher, R.A., 1931, The Design of Experiments

Hypothesis Testing and the Philosophy of Science II

  • Modern philosophy of science is largely based off of hypothesis testing and falsifiability, which form the “Scientific Method”1

  • For something to be “scientific”, it must be falsifiable, or at least testable (at least in principle)

  • Hypotheses can be corroborated with evidence, but always tentative until falsified by data in suggesting an alternative hypothesis

  • “All swans are white” is a hypothesis rejected upon discovery of a single black swan

  1. Note: economics is a very different kind of “science” with a different methodology!

Hypothesis Testing and p-Values

  • Hypothesis testing, confidence intervals, and p-values are probably the hardest thing to understand in statistics

Fivethirtyeight: Not Even Scientists Can Easily Explain P-values

Hypothesis Testing: Which Test? I

  • Rigorous course on statistics (ECMG 212 or MATH 112) will spend weeks going through different types of tests:
    • Sample mean; difference of means
    • Proportion; difference of proportions
    • Z-test vs t-test
    • 1 sample vs. 2 samples
    • χ2 test

Hypothesis Testing: Which Test? II

There is Only One Test!

  • Fortunately, some clever statisticians realized “there is only one test” and some built a nice R package called infer
  1. Calculate a statistic, δi1, from a sample of data

  2. Simulate a world where δ is null (H0)

  3. Examine the distribution of δ across the null world

  4. Calculate the probability that δi could exist in the null world

  5. Decide if δi is statistically significant

  1. δ can stand in for any test-statistic in any hypothesis test! For our purposes, δ is the slope of our regression sample, ˆβ1.

Elements of a Hypothesis Test

Alan Downey: “There is still only one test”

Hypothesis Testing with the infer Package I

  • R naturally runs the following hypothesis test on any regression as part of lm():

H0:β1=0H1:β1≠0

  • infer allows you to run through these steps manually to understand the process:
  1. specify() a model
  1. hypothesize() the null
  1. generate() simulations of the null world
  1. calculate() the p-value
  1. visualize() with a histogram (optional)

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Theory-Based Inference: Critical Values of Test Statistic

  • Test statistic δ: measures how far what we observed in our sample (^β1) is from what we would expect if the null hypothesis were true (β1=0)
    • Calculated from a sampling distribution of the estimator (i.e. ^β1)
    • In econometrics, we use t-distributions which have n−k−1 degrees of freedom1
  • Rejection region: if the test statistic reaches a “critical value” of δ, then we reject the null hypothesis
  1. Again, see last class’s appendix for more on the t-distribution. k is the number of independent variables our model has, in this case, with just one X, k=1. We use two degrees of freedom to calculate ˆβ0 and ˆβ1, hence we have n−2 df.

Theory-Based Inference: Critical Values of Test Statistic

Hypothesis Testing by Simulation, with infer

Imagine a Null World, where H0 is True

Our world, and a world where β1=0 by assumption.

Comparing the Worlds I

  • From that null world where H0:β1=0 is true, we simulate another sample and calculate OLS estimators again
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
(Intercept)698.9329529.467491473.8245146.569925e-242
str-2.2798080.4798256-4.7513272.783307e-06
2 rows
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
(Intercept)661.22045589.713587668.0717032.606919e-228
str-0.35966160.4922981-0.7305774.654468e-01
2 rows

Comparing the Worlds II

  • From that null world where H0:β1=0 is true, let’s simulate 1,000 samples and calculate slope (^β1) for each
ABCDEFGHIJ0123456789
sample
<int>
slope
<dbl>
13.243349e-01
2-4.184797e-01
3-1.082485e-01
42.731168e-01
51.362635e-01
62.812281e-01
72.323034e-01
8-4.924458e-01
9-3.243620e-01
102.241032e-01
Next
123456
...
100
Previous
1-10 of 1,000 rows

Prepping the infer Pipeline

  • Before I show you how to do this, let’s first save our estimated slope from our actual sample
    • We’ll want this later!
# save as our_slope
our_slope <- school_reg %>% 
  tidy() %>%
  filter(term == "str") %>%
  pull(estimate)

# look at it
our_slope
[1] -2.279808

The infer Pipeline: specify()

The infer Pipeline: specify()

data %>%

specify(y ~ x)

  • Take our data and pipe it into the specify() function, which is essentially a lm() function for regression (for our purposes)
ca_school %>%
  specify(testscr ~ str)
ABCDEFGHIJ0123456789
testscr
<dbl>
str
<dbl>
690.8017.88991
661.2021.52466
643.6018.69723
647.7017.35714
640.8518.67133
605.5521.40625
606.7519.50000
609.0020.89412
612.5019.94737
612.6520.80556
Next
123456
...
42
Previous
1-10 of 420 rows

The infer Pipeline: hypothesize()

The infer Pipeline: hypothesize()

data %>%

specify(y ~ x) %>%

hypothesize(null = "independence")

  • Describe what the null hypothesis is here
  • In infer’s language, str and testscr are independent (β1=0)1
ca_school %>%
  specify(testscr ~ str) %>%
  hypothesize(null = "independence")
ABCDEFGHIJ0123456789
testscr
<dbl>
str
<dbl>
690.8017.88991
661.2021.52466
643.6018.69723
647.7017.35714
640.8518.67133
605.5521.40625
606.7519.50000
609.0020.89412
612.5019.94737
612.6520.80556
Next
123456
...
42
Previous
1-10 of 420 rows
  1. See more here about what other hypotheses you can test with infer

The infer Pipeline: generate()

The infer Pipeline: generate()

data %>%

specify(y ~ x) %>%

hypothesize(null = "independence") %>%

generate(reps = n, type = "permute")

  • Now the magic starts, as we run a number of simulated samples
  • Set the number of reps and set the type equal to "permute" (not bootstrap)
    • Permutation randomly matches X-values and Y-values from the data so that there is no relationship between X and Y
ca_school %>%
  specify(testscr ~ str) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000,
           type = "permute")
ABCDEFGHIJ0123456789
testscr
<dbl>
str
<dbl>
replicate
<int>
644.4517.889911
645.7521.524661
682.6518.697231
652.0017.357141
661.2018.671331
641.4521.406251
644.5019.500001
686.0520.894121
635.0519.947371
668.6020.805561
Next
123456
...
1000
Previous
1-10 of 10,000 rows

The infer Pipeline: calculate()

The infer Pipeline: calculate()

data %>%

specify(y ~ x) %>%

hypothesize(null = "independence") %>%

generate(reps = n, type = "permute") %>%

calculate(stat = "slope")

  • We calculate sample statistics for each of the 1,000 replicate samples

  • In our case, calculate the slope1 (ˆβ1) for each replicate

ca_school %>%
  specify(testscr ~ str) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000,
           type = "permute") %>%
  calculate(stat = "slope")
ABCDEFGHIJ0123456789
replicate
<int>
stat
<dbl>
1-0.6319890985
2-0.4483151570
3-0.0419777100
40.3747275623
50.0906881011
60.0280732854
7-0.1811392072
8-0.7416631878
90.3715154152
10-0.1625787306
Next
123456
...
100
Previous
1-10 of 1,000 rows
  1. See package information for other stats you can estimate

The infer Pipeline: get_p_value()

data %>%

specify(y ~ x) %>%

hypothesize(null = "independence") %>%

generate(reps = n, type = "permute") %>%

calculate(stat = "slope") %>%

get_p_value(obs stat = "", direction = "both")

  • We can calculate the p-value
    • the probability of seeing a value at least as large as our_slope (-2.28) in our simulated null distribution
  • Two-sided alternative Ha:β1≠0, we double the raw p-value
ca_school %>%
  specify(testscr ~ str) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000,
           type = "permute") %>%
  calculate(stat = "slope") %>%
  get_p_value(obs_stat = our_slope,
              direction = "both")
ABCDEFGHIJ0123456789
p_value
<dbl>
0
1 row

The infer Pipeline: visualize()

The infer Pipeline: visualize()

data %>%

specify(y ~ x) %>%

hypothesize(null = "independence") %>%

generate(reps = n, type = "permute") %>%

calculate(stat = "slope") %>%

visualize()

  • Make a histogram of our null distribution of β1
    • Note it is centered at β1=0 because that’s H0!
ca_school %>%
  specify(testscr ~ str) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000,
           type = "permute") %>%
  calculate(stat = "slope") %>%
  visualize()

The infer Pipeline: visualize()

data %>%

specify(y ~ x) %>%

hypothesize(null = "independence") %>%

generate(reps = n, type = "permute") %>%

calculate(stat = "slope") %>%

visualize()

  • Add our our_slope to show our finding on the null distr.
ca_school %>%
  specify(testscr ~ str) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000,
           type = "permute") %>%
  calculate(stat = "slope") %>%
  visualize(obs_stat = our_slope)

The infer Pipeline: visualize()

data %>%

specify(y ~ x) %>%

hypothesize(null = "independence") %>%

generate(reps = n, type = "permute") %>%

calculate(stat = "slope") %>%

visualize() + shade_p_value()

  • Add shade_p_value() to see what p is
ca_school %>%
  specify(testscr ~ str) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000,
           type = "permute") %>%
  calculate(stat = "slope") %>%
  visualize(obs_stat = our_slope) +
  shade_p_value(obs_stat = our_slope, #<<
                direction = "two_sided")

visualize() is Just a Wrapper for ggplot

  • Plot
  • Code

# infer
ca_school %>%
  specify(testscr ~ str) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000,
           type = "permute") %>%
  calculate(stat = "slope") %>%
  # pipe into ggplot
  ggplot(data = )+
  aes(x = stat)+
  geom_histogram(color="white", fill="#e64173")+
  geom_vline(xintercept = our_slope,
             color = "blue",
             size = 2,
             linetype = "dashed")+
  annotate(geom = "label",
           x = -2.28,
           y = 100,
           label = expression(paste("Our ", hat(beta[1]))),
           color = "blue")+
  scale_y_continuous(lim=c(0,130),
                     expand = c(0,0))+
  labs(x = expression(paste("Sampling distribution of ", hat(beta)[1], " under ", H[0], ":  ", beta[1]==0)),
       y = "Samples")+
    theme_classic(base_family = "Fira Sans Condensed",
           base_size=20)

Theory-Based Hypothesis Testing (What R Calculates)

What R Does: Theory-Based Statistical Inference I

  • R does things the old-fashioned way, using a theoretical null distribution instead of simulating one

  • A t-distribution with n−k−1 df1

  • Calculate a t-statistic for ^β1:

test statistic=estimate−null hypothesisstandard error of estimate

  1. k is the number of X variables.

What R Does: Theory-Based Statistical Inference II

test statistic=estimate−null hypothesisstandard error of estimate

  • t same interpretation as Z: number of std. dev. away from the sampling distribution’s expected value E[^β1]1 (if H0 were true)

  • Compares to a critical value of t∗ (pre-determined by α-level & n−k−1 df)

    • For 95% confidence, α=0.05, t∗≈22

  1. The expected value is 0, because our null hypothesis was β1=0

  2. Again, the 68-95-99.7% empirical rule!

What R Does: Theory-Based Statistical Inference III

t=^β1−β1,0se(^β1)t=−2.28−00.48t=−4.75

  • Our sample slope ^β1 is 4.75 standard deviations below the expected value E[^β1] (i.e. 0) if H0 were true

What R Does: Theory-Based Statistical Inference IV

$$t=^β1−β1,0se(^β1)t=−2.28−00.48t=−4.75$$

  • .hi[p-value]: prob. of a test statistic at least as large (in magnitude) as ours if the null hypothesis were true
    • Continuous distribution implies we need probability of area beyond our value
    • p-value is 2-sided for Ha:β1≠0
  • 2×p(t418>|−4.75|)=0.0000028

One-Sided Tests & p-Values

Ha:β1<0

p-value: p(t≤ti)

Ha:β1>0

p-value: p(t≥ti)

Two-Sided Tests and p-Values

Ha:β1≠0

p-value: 2×p(t≥|ti|)

Calculating p-Values in R

  • pt() calculates probabilities on a t distribution with arguments:
    • the t-score
    • df = the degrees of freedom
    • lower.tail =
      • TRUE if looking at area to LEFT of value
      • FALSE if looking at area to RIGHT of value
2 * pt(4.75, # I'll double the right tail
       df = 418,
       lower.tail = F) # right tail
[1] 2.800692e-06
  • 2×p(t418>|−4.75|)=0.0000028

Hypothesis Tests in Regression Output I

school_reg %>% summary()

Call:
lm(formula = testscr ~ str, data = ca_school)

Residuals:
    Min      1Q  Median      3Q     Max 
-47.727 -14.251   0.483  12.822  48.540 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 698.9330     9.4675  73.825  < 2e-16 ***
str          -2.2798     0.4798  -4.751 2.78e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 18.58 on 418 degrees of freedom
Multiple R-squared:  0.05124,   Adjusted R-squared:  0.04897 
F-statistic: 22.58 on 1 and 418 DF,  p-value: 2.783e-06

Hypothesis Tests in Regression Output II

  • In broom’s tidy() (with confidence intervals)
tidy(school_reg, conf.int=TRUE)
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
(Intercept)698.9329529.467491473.8245146.569925e-242
str-2.2798080.4798256-4.7513272.783307e-06
2 rows | 1-5 of 7 columns
  • p-value on str is 0.00000278.

Conclusions

H0:β1=0Ha:βa≠0

  • Because the hypothesis test’s p-value < α (0.05)…

  • We have sufficient evidence to reject H0 in favor of our alternative hypothesis. Our sample suggests that there is a relationship between class size and test scores.

  • Using the confidence intervals:

  • We are 95% confident that, from similarly constructed samples, the true marginal effect of class size on test scores is between -3.22 and -1.34.

Hypothesis Testing vs. Confidence Intervals

  • Confidence intervals are all two-sided by nature

CI0.95=([^β1−2×se(^β1)⏟MOE],[^β1+2×se(^β1)⏟MOE])

  • Hypothesis test (t-test) of H0:β1=0 computes a t-value of6[Since our null hypothesis is that β1,0=0, the test statistic simplifies to this neat fraction.]

t=^β1se(^β1)

and p<0.05 when t≥2 (approximately)

  • If our confidence interval contains the H0 value (i.e. 0, for our test), then we fail to reject H0.

The Use and Abuse of p-values

p-Hacking

p-Hacking

p-Hacking

p-Hacking

p-Hacking

  • Consider what 95% confident or α=0.05 means

  • If we repeat a procedure 20 times, we should expect 120 (5%) to produce a fluke result!

Image source: Seeing Theory

Abusing p-values and “Science”

Source: Washington Post

Abusing p-Values and “Science” I

Source: SMBC

Abusing p-Values and “Science” II

“The widespread use of ‘statistical significance’ (generally interpreted as (p≤0.05) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.”

Wasserstein, Ronald L. and Nicole A. Lazar, (2016), “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician 30(2): 129-133

Abusing p-Values and “Science” III

“No economist has achieved scientific success as a result of a statistically significant coefficient. Massed observations, clever common sense, elegant theorems, new policies, sagacious economic reasoning, historical perspective, relevant accounting, these have all led to scientific success. Statistical significance has not,” (p.112).

McCloskey, Dierdre N and Stephen Ziliak, 1996, The Cult of Statistical Significance

Common Misconceptions About p-Values

❌ p is the probability that the alternative hypothesis is false - We can never prove an alternative hypothesis, only tentatively reject a null hypothesis

❌ p is the probability that the null hypothesis is true - We’re not proving the H0 is false, only saying that it’s very unlikely that if H0 were true, we’d obtain a slope as rare as our sample’s slope

❌ p is the probability that our observed effects were produced purely by random chance - p is computed under a specific model (think about our null world) that assumes H0 is true

❌ p tells us how significant our finding is - p tells us nothing about the size or the real world significance of any effect deemed “statistically significant” - it only tells us that the slope is statistically significantly different from 0 (if H0 is β1=0)

p-Values: Restatement

  • Again, p-value is the probability that, if the null hypothesis were true, we obtain (by pure random chance) a test statistic at least as extreme as the one we estimated for our sample

  • A low p-value means either (and we can’t distinguish which):

    1. H0 is true and a highly improbable event has occurred OR
    2. H0 is false

Statistical Significance In Regression Tables

Test Score
Constant 698.93***
(9.47)
STR −2.28***
(0.48)
n 420
R2 0.05
SER 18.54
* p < 0.1, ** p < 0.05, *** p < 0.01
  • Statistical significance is shown by asterisks, common (but not always!) standard:
    • 1 asterisk: significant at α=0.10
    • 2 asterisks: significant at α=0.05
    • 3 asterisks: significant at α=0.01
  • Rare, but sometimes regression tables include p-values for estimates

ECON 480 — Econometrics

1
2.7 — Hypothesis Testing (Regression) ECON 480 • Econometrics • Fall 2022 Dr. Ryan Safner Associate Professor of Economics safner@hood.edu ryansafner/metricsF22 metricsF22.classes.ryansafner.com

  1. Slides

  2. Tools

  3. Close
  • Title Slide
  • Contents
  • Hypothesis Testing
  • Estimation and Hypothesis Testing I
  • Estimation and Hypothesis Testing II
  • Null and Alternative Hypotheses I
  • Null and Alternative Hypotheses II
  • Components of a Valid Hypothesis Test
  • Type I and Type II Errors I
  • Type I and Type II Errors II
  • Type I and Type II Errors III
  • Type I and Type II Errors IV
  • Type I and Type II Errors V
  • Type I and Type II Errors VI
  • Type I and Type II Errors VII
  • Significance Level, \(\alpha\), and Confidence Level \(1-\alpha\)
  • \(\alpha\) and \(\beta\)
  • Power and p-values
  • p-values and Statistical Significance
  • Digression: p-Values and the Philosophy of Science
  • Hypothesis Testing and the Philosophy of Science I
  • Hypothesis Testing and the Philosophy of Science II
  • Hypothesis Testing and p-Values
  • Hypothesis Testing: Which Test? I
  • Hypothesis Testing: Which Test? II
  • There is Only One Test!
  • Elements of a Hypothesis Test
  • Hypothesis Testing with the infer Package I
  • Hypothesis Testing with the infer Package II
  • Hypothesis Testing with the infer Package II
  • Hypothesis Testing with the infer Package II
  • Hypothesis Testing with the infer Package II
  • Hypothesis Testing with the infer Package II
  • Hypothesis Testing with the infer Package II
  • Theory-Based Inference: Critical Values of Test Statistic
  • Theory-Based Inference: Critical Values of Test Statistic
  • Hypothesis Testing by Simulation, with infer
  • Imagine a Null World, where \(H_0\) is True
  • Comparing the Worlds I
  • Comparing the Worlds II
  • Prepping the infer Pipeline
  • The infer Pipeline: specify()
  • The infer Pipeline: specify()
  • The infer Pipeline: hypothesize()
  • The infer Pipeline: hypothesize()
  • The infer Pipeline: generate()
  • The infer Pipeline: generate()
  • The infer Pipeline: calculate()
  • The infer Pipeline: calculate()
  • The infer Pipeline: get_p_value()
  • The infer Pipeline: visualize()
  • The infer Pipeline: visualize()
  • The infer Pipeline: visualize()
  • The infer Pipeline: visualize()
  • visualize() is Just a Wrapper for ggplot
  • Theory-Based Hypothesis Testing (What R Calculates)
  • What R Does: Theory-Based Statistical Inference I
  • What R Does: Theory-Based Statistical Inference II
  • What R Does: Theory-Based Statistical Inference III
  • What R Does: Theory-Based Statistical Inference IV
  • One-Sided Tests & p-Values
  • Two-Sided Tests and p-Values
  • Calculating p-Values in R
  • Hypothesis Tests in Regression Output I
  • Hypothesis Tests in Regression Output II
  • Conclusions
  • Hypothesis Testing vs. Confidence Intervals
  • The Use and Abuse of \(p\)-values
  • p-Hacking
  • p-Hacking
  • p-Hacking
  • p-Hacking
  • p-Hacking
  • Abusing p-values and “Science”
  • Abusing p-Values and “Science” I
  • Abusing p-Values and “Science” II
  • Abusing p-Values and “Science” III
  • Common Misconceptions About p-Values
  • p-Values: Restatement
  • Statistical Significance In Regression Tables
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help