Problem Set 2


This assignment is due by class on Wednesday September 28.

Please read the instructions for completing and submitting homeworks.

PDF R Project R Studio Cloud

The PDF is useful if you want to print out the problem set and write on it. The R Project is a zipped .zip file which contains a .qmd file to write answers in, and the data, all in a logical working directory. (See this resource for help unzipping files). You can also just write an .R file in the project if you don’t want to use markdown. If you use the cloud project, I have already installed tidyverse and tinytex (to produce pdfs).


html R Project R Studio Cloud

Theory and Concepts

Question 1

In your own words, explain the difference between endogeneity and exogeneity.

Question 2

Part A

In your own words, explain what (sample) standard deviation means.

Part B

In your own words, explain how (sample) standard deviation is calculated. You may also write the formula, but it is not necessary.


For the remaining questions, you may use R to verify, but please calculate all sample statistics by hand and show all work.

Question 3

Suppose you have a very small class of four students that all take a quiz. Their scores are reported as follows:

\[\{83, 92, 72, 81\}\]

Part A

Calculate the median.

Part B

Calculate the sample mean, \(\bar{x}\).

Part C

Calculate the sample standard deviation, \(s\).

Part D

Make or sketch a rough histogram of this data, with the size of each bin being 10 (i.e. 70’s, 80’s, 90’s, 100’s). You can draw this by hand or use R.

If you are using ggplot, you want to use + geom_histogram(breaks = seq(start,end,by)) and add another layer + scale_x_continuous(breaks=seq(start,end,by)) . The first layer creates bins in the histogram, and the second layer creates ticks on the x axis; both by creating a sequence starting at some starting value, some ending value, by a certain interval (e.g. by 2, or by 10).

Is this distribution roughly symmetric or skewed? What would we expect about the mean and the median?

Part E

Suppose instead the person who got the 72 did not show up that day to class, and got a 0 instead. Recalculate the mean and median. What happened and why?

Question 4

Suppose the probabilities of a visitor to Amazon’s website buying 0, 1, or 2 books are 0.2, 0.4, and 0.4 respectively.

Part A

Calculate the expected number of books a visitor will purchase.

Part B

Calculate the standard deviation of book purchases.

Part C

Bonus: try doing this in R by making an initial dataframe of the data, and then making new columns to the “table” like we did in class.

Question 5

Scores on the SAT (out of 1600) are approximately normally distributed with a mean of 500 and standard deviation of 100.

Part A

What is the probability of getting a score between a 400 and a 600?

Part B

What is the probability of getting a score between a 300 and a 700?

Part C

What is the probability of getting at least a 700?

Part D

What is the probability of getting at most a 700?

Part E

What is the probability of getting exactly a 500?

Question 6

Redo problem 5 by using the pnorm() command in R.

Hint: This function has four arguments:

  1. the value of the random variable
  2. the mean of the distribution
  3. the sd of the distribution
  4. lower.tail TRUE or FALSE.