# Problem Set 2

Please read the instructions for completing and submitting homeworks.

The PDF is useful if you want to print out the problem set and write on it. The R Project is a zipped `.zip`

file which contains a `.qmd`

file to write answers in, and the data, all in a logical working directory. (See this resource for help unzipping files). You can also just write an `.R`

file in the project if you don’t want to use markdown. If you use the cloud project, I have already installed `tidyverse`

and `tinytex`

(to produce pdfs).

## Answers

# Theory and Concepts

## Question 1

In your own words, explain the difference between endogeneity and exogeneity.

## Question 2

### Part A

In your own words, explain what (sample) standard deviation *means*.

### Part B

In your own words, explain how (sample) standard deviation *is calculated.* You may also write the formula, but it is not necessary.

# Problems

For the remaining questions, you may use `R`

to *verify*, but please calculate all sample statistics by hand and show all work.

## Question 3

Suppose you have a very small class of four students that all take a quiz. Their scores are reported as follows:

\[\{83, 92, 72, 81\}\]

### Part A

Calculate the median.

### Part B

Calculate the sample mean, \(\bar{x}\).

### Part C

Calculate the sample standard deviation, \(s\).

### Part D

Make or sketch a rough histogram of this data, with the size of each bin being 10 (i.e. 70’s, 80’s, 90’s, 100’s). You can draw this by hand or use `R`

.

If you are using `ggplot`

, you want to use `+ geom_histogram(breaks = seq(start,end,by))`

and add another layer `+ scale_x_continuous(breaks=seq(start,end,by))`

. The first layer creates bins in the histogram, and the second layer creates ticks on the x axis; both by creating a `seq`

uence starting at some `start`

ing value, some `end`

ing value, `by`

a certain interval (e.g. by 2, or by 10).

Is this distribution roughly symmetric or skewed? What would we expect about the mean and the median?

### Part E

Suppose instead the person who got the 72 did not show up that day to class, and got a 0 instead. Recalculate the mean and median. What happened and why?

## Question 4

Suppose the probabilities of a visitor to Amazon’s website buying 0, 1, or 2 books are 0.2, 0.4, and 0.4 respectively.

### Part A

Calculate the *expected number* of books a visitor will purchase.

### Part B

Calculate the *standard deviation* of book purchases.

### Part C

**Bonus**: try doing this in `R`

by making an initial dataframe of the data, and then making new columns to the “table” like we did in class.

## Question 5

Scores on the SAT (out of 1600) are approximately normally distributed with a mean of 500 and standard deviation of 100.

### Part A

What is the probability of getting a score between a 400 and a 600?

### Part B

What is the probability of getting a score between a 300 and a 700?

### Part C

What is the probability of getting *at least* a 700?

### Part D

What is the probability of getting *at most* a 700?

### Part E

What is the probability of getting exactly a 500?

## Question 6

Redo problem 5 by using the `pnorm()`

command in `R`

.

**Hint**: This function has four arguments:

- the value of the random variable
- the mean of the distribution
- the sd of the distribution
`lower.tail`

`TRUE`

or`FALSE`

.