1 Week 1

2021-08-18 [updated: 2022-01-26]

1.1 Variables

N: fixed by experimenter

p: Prior probability

W: A probability distribution given the data.

1.2 Joint model

W ~ Binomial(N , p)

p ~ Uniform(0, 1)

W is distributed binomially with N observations and probability p on each

p is distributed uniformally at 1

1.3 Question 1

Suppose the globe tossing data (Chapter 2) had turned out to be 8 water in 15 tosses. Construct the posterior distribution, using grid approximation. Use the same flat prior as in the book.

# Size of grid for grid approximation
gridsize <- 1000

# Prior grid
prior_grid <- seq(0, 1, length.out = gridsize)

# Prior probability (all 1)
prior_prob <- rep(1, gridsize)

# Data probability
#  given 4/15, using binomial distribution
data_prob <- dbinom(8, 15, prob = prior_grid)

# Calculate the posterior numerator by multiplying prior and data probability
posterior_num <- prior_prob * data_prob
# Standardize by sum of posterior numerator
posterior <- posterior_num / sum(posterior_num)

# Save for later
posterior_1 <- posterior

plot(prior_grid, posterior, type = 'l')
# Sample from posterior
samples <- sample(prior_grid, size = gridsize, prob = posterior, replace = TRUE)
mean(samples)
## [1] 0.54
PI(samples, .99)
##   1% 100% 
## 0.27 0.80

1.4 Question 2

Start over in 1, but now use a prior that is zero below p = 0.5 and a constant above p = 0.5. This corresponds to prior information that a majority of the Earth’s surface is water. What difference does the better prior make?

# Size of grid for grid approximation
gridsize <- 1000

# Prior grid
prior_grid <- seq(0, 1, length.out = gridsize)

# Prior probability (all 1 above 0.5, all 0 below)
prior_prob <- c(rep(0, gridsize / 2), rep(1, gridsize / 2))

# Data probability
#  given 4/15, using binomial distribution
data_prob <- dbinom(4, 15, prob = prior_grid)

# Calculate the posterior numerator by multiplying prior and data probability
posterior_num <- prior_prob * data_prob
# Standardize by sum of posterior numerator
posterior <- posterior_num / sum(posterior_num)

# Save for later
posterior_2 <- posterior

plot(prior_grid, posterior, type = 'l')
# Sample from posterior
samples <- sample(prior_grid, size = gridsize, prob = posterior, replace = TRUE)
mean(samples)
## [1] 0.55
PI(samples, .99)
##   1% 100% 
## 0.50 0.72

Narrower curve, higher max, all zeroes before 0.5

1.5 Question 3

For the posterior distribution from 2, compute 89% percentile and HPDI intervals. Compare the widths of these intervals. Which is wider? Why? If you had only the information in the interval, what might you misunderstand about the shape of the posterior distribution?

# Calculate Percentile Interval at 89%
percent_interval <- PI(posterior, prob = 0.89)
percent_interval
##     5%    94% 
## 0.0000 0.0073
# Calculate Highest Posterior Density at 89%
highest_post_dens <- HPDI(posterior, prob = 0.89)
highest_post_dens
##  |0.89  0.89| 
## 0.0000 0.0025