boot
package by Brian Ripley.Suppose that x_1, x_2, ..., x_n \sim f(\theta), where f(\cdot) is some arbitrary probability distribution. We are interested in estimating \theta. Let \hat{\theta} be the point estimate of the parameter of interest obtained from the original dataset.
Note that these are nonparameteric confidence intervals (that is, we don't assume that f is a particular parametric family).
The example comes from page 201 of DiCiccio and Efron (1996), which contains a dataset on five exams across 22 students (with some missing data). The parameter of interest is the maximum eigenvalue of the empirical covariance matrix. You may download the data here. The code for estimating the max eigenvalue is provided at the end of this post. The function is called
calculate.max.eigen
.Using this function on the dataset, we obtain \hat{\theta}=216.2.
Now, suppose that we bootstrap the dataset M times and for each of the datasets, calculate \hat{\theta}^*_{m}, which is the point estimate of \theta for the mth bootstrapped dataset. The empirical distribution of \hat{\theta}^* is the bootstrap distribution and is an approximation to the sampling distribution for \hat{\theta}.
To create M=10000 bootstrap replicates in R:
BS.eigen <- function(data){ index <- sample(1:nrow(data),size=nrow(data),replace=TRUE) bs.data <- data[index,] return(calculate.max.eigen(bs.data)) } bs.sampling <- replicate(10000,BS.eigen(data),simplify=TRUE)Here is the distribution of these estimates (with the median of the bootstrapped estimates denoted with the vertical blue line and \hat{\theta} denoted with a vertical red line):
We can then make use of the bootstrap distribution in a number of ways to obtained bootstrapped confidence intervals.
- Normal intervals:
\hat{\theta} \pm z_{\alpha/2} \sigma_{bs},
where z_{\alpha/2} is the \alpha/2-quantile of the standard normal distribution and \sigma_{bs} is the standard deviation of the bootstrap distribution. These intervals are justified by asymptotic theory (Central Limit Theorem).
In R:
In R:
In R:
theta.hat + qnorm(0.025)*sd(bs.sampling) theta.hat + qnorm(0.975)*sd(bs.sampling)For our example, we obtain a confidence interval of [202.64,1050.01].
- Transformed normal intervals:
t^{-1}\{t(\hat{\theta}) \pm z_{\alpha/2} \sigma_{tbs}\},
where t(\cdot) is a variance-stabilizing square root transformation and \sigma_{tbs} is the standard deviation of the transformed bootstrap distribution (distribution of \hat{\theta}^*).
In R:
In R:
In R:
(sqrt(theta.hat) + qnorm(0.025)*sd(sqrt(bs.sampling)))^2 (sqrt(theta.hat) + qnorm(0.975)*sd(sqrt(bs.sampling)))^2For our example, we obtain a confidence interval of [263.67, 1143.33].
- Basic bootstrap intervals:
\left[2\hat{\theta}- \hat{\theta}^*_{m=(1-\alpha/2)M}, 2\hat{\theta}- \hat{\theta}^*_{m=(\alpha/2)M}\right]
In R:
2*theta.hat - quantile(bs.sampling,0.975) 2*theta.hat - quantile(bs.sampling,0.025)
For our example, we obtain a confidence interval of [186.45, 1018.62].
- Transformed basic bootstrap intervals:basic bootstrap intervals with similar transformation to the transformed normal intervals.
(2*sqrt(theta.hat) - quantile(sqrt(bs.sampling),0.975))^2 (2*sqrt(theta.hat) - quantile(sqrt(bs.sampling),0.025))^2
For our example, we obtain a confidence interval of [302.76, 1208.00].
- Percentile confidence intervals:
\left[\hat{\theta}^*_{m=(1-\alpha/2)M}, \hat{\theta}^*_{m=(\alpha/2)M}\right]
In R:
quantile(bs.sampling,0.975) quantile(bs.sampling,0.025)
For our example, we obtain a confidence interval of [233.93, 1066.10].
- BCa confidence intervals: refinement on the percentile confidence interval method, designed to increase accuracy. Note that BCa reduces to standard percentile confidence intervals if the bootstrap distribution is unbiased (median of the distribution is equal to the original point estimate) and the acceleration (skewness) is zero.
Step 1: Calculate the bias-correction \hat{z}_0, which gives the standard normal quantile function of the proportion of bootstrapped estimates less than the original point estimate:
\hat{z}_0 = \Phi^{-1} \left\{\dfrac{\sum \{\hat{\theta}^* < \hat{\theta} \}}{M} \right\}
In R:
Step 2: Calculate the acceleration a, which has an interpretation of skewness (more specifically, it measures how rapidly standard error changes on a normalized scale). A non-parametric estimate of a is:
z0 <- qnorm(mean(bs.sampling < theta.hat))For our example, \hat{z}_0 is 0.194, which indicates a positive bias correction. This follows from the fact that 57.7% of \hat{\theta}^*s are below the original point estimate \hat{\theta} (downward bias). We can see this from the plot above since the median of the bootstrap distribution is to the left of the point estimate from the original dataset.
Step 2: Calculate the acceleration a, which has an interpretation of skewness (more specifically, it measures how rapidly standard error changes on a normalized scale). A non-parametric estimate of a is: