Bootstrap Estimate Of The Standard Error Of The Mean
Contents |
whether or not the person got a speeding ticket. The data for women that received a ticket are shown below. Women, ticket:Sample: 103, 104, 109, 110, 120 Suppose we are interested in the following estimations: Estimate the population mean μ and get the bootstrap standard error formula standard deviation of the sample mean \(\bar{x}\). Estimate the population median η and get the standard
Bootstrapping Statistics
deviation of the sample median. For (1), we have already found in the previous section that the sampling distribution of \(\bar{X}\) is approximately Normal (under
Bootstrap Standard Error In R
certain conditions) with \[\begin{align}& \bar{x}=109.2\\& \text{SD}=6.76\\& n=5\\& \text{SD}(\bar{x})=\frac{s}{\sqrt{n}}=\frac{6.76}{\sqrt{5}}=3.023\end{align}\] What about the estimate of the population median, η. Let's denote the estimate M. We are interested in the standard deviation of the M. We can easily find the sample median by
Bootstrap Statistics Example
finding the middle observation of the ordered data. Thus, M = 109. But what about the standard deviation of the sample median? If we knew the underlying distribution of driving speeds of women that received a ticket, we could follow the method above and find the sampling distribution. To do this, we would follow these steps. Obtain a random sample of size n = 5 and calculate the sample median, M1. Gather another sample of size n = 5 and bootstrapping in r calculate M2. Repeat steps the steps until we obtained a desired number of sample medians, say 1000). Obtain the approximate distribution of the sample median and from there an estimate of the standard deviation. We can approximate the distribution by creating a histogram of all the sample medians. The trouble with this is that we do not know (nor want to assume) what distribution the data come from. A solution is to let the observed data represent the population and sample data from the original data. Therefore, we would sample n = observations from 103, 104, 109, 110, 120 with replacement. Sampling with replacement is important. If we did not sample with replacement, we would always get the same sample median as the observed value. The sample we get from sampling from the data with replacement is called the bootstrap sample. Summary of Steps: Replace the population with the sample Sample with replacement B times Compute sample medians each time Mi Compute the SD of M1, ... , MB. Example I created a function in R to generate a sample of size n = 5 observations from 103, 104, 109, 110, 120 and recorded the sample median. Below is a table of the results for B = 14, 20, 1000, 10000. As you can see the standard deviations are all quite close to each other, even when we only generated 14 samples. B SD(M) 14 4.1 20 3.87 1000 3.9
Tour Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site bootstrap confidence interval About Us Learn more about Stack Overflow the company Business Learn more about bootstrap method example hiring developers or posting ads with us Cross Validated Questions Tags Users Badges Unanswered Ask Question _ Cross Validated is nonparametric bootstrap a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Join them; it only takes a minute: Sign up Here's how it works: https://onlinecourses.science.psu.edu/stat464/node/80 Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top Use of standard error of bootstrap distribution up vote 18 down vote favorite 7 (ignore the R code if needed, as my main question is language-independent) If I want to look at the variability of a simple statistic (ex: mean), I know I can do it via http://stats.stackexchange.com/questions/22472/use-of-standard-error-of-bootstrap-distribution theory like: x = rnorm(50) # Estimate standard error from theory summary(lm(x~1)) # same as... sd(x) / sqrt(length(x)) or with the bootstrap like: library(boot) # Estimate standard error from bootstrap (x.bs = boot(x, function(x, inds) mean(x[inds]), 1000)) # which is simply the standard *deviation* of the bootstrap distribution... sd(x.bs$t) However, what I'm wondering is, can it be useful/valid(?) to look to the standard error of a bootstrap distribution in certain situations? The situation I'm dealing with is a relatively noisy nonlinear function, such as: # Simulate dataset set.seed(12345) n = 100 x = runif(n, 0, 20) y = SSasymp(x, 5, 1, -1) + rnorm(n, sd=2) dat = data.frame(x, y) Here the model doesn't even converge using the original data set, > (fit = nls(y ~ SSasymp(x, Asym, R0, lrc), dat)) Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model so the statistics I'm interested in instead are more stabilized estimates of these nls parameters - perhaps their means across a number of bootstrap replications. # Obtain mean bootstrap nls parameter estimates fit.bs = boot(dat, function(dat, inds) tryCatch(coef(nls(y ~ SSasymp(x, Asym, R0, lrc), dat[inds, ])), error=function(e) c(NA, NA, NA)), 10
Tour Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about http://stats.stackexchange.com/questions/118090/using-bootstrap-to-estimate-confidence-interval-of-the-standard-deviation hiring developers or posting ads with us Cross Validated Questions Tags Users Badges Unanswered Ask Question _ Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top Using Bootstrap to estimate confidence interval of standard error the standard deviation up vote 2 down vote favorite I am trying to compare two different methods of estimating confidence intervals: a parametric approach that uses the assumption that the sample is t-distributed (i.e. the formulas that are given here: Wikipedia: Normal Distribution), and bootstrapping. The procedure is rather simple: For every sample size $5 < N <200$, I generate a sample with $N$ normally distributed random numbers. I then calculate the confidence interval using the parametric formulas for bootstrap standard error every such sample. Then, I do the same with bootstrap: For each sample, I draw 1000 sub-samples with replacement, calculate their mean and standard deviation, sort these values, and cut off the top and bottom 2.5%, which should give me the 95%-confidence interval ($\alpha=5\%$). For each sample size, I then plot the width of the confidence interval, both for the one that I got from the parametric approach and the bootstrap. I also calculate their difference as: [(Width of Parametric CI) - (Width of Bootstrap CI)] / (Width of Parametric CI) * 100, to get a percentage. I would expect that for increasing sample size, the difference between the two methods vanishes. As shown in the graphic below, this is indeed the case for the mean of the sample. However for the standard deviation, the parametric model and bootstrap do not really seem to converge against the same value, or at least significantly slower than the mean. I wonder: Is there a reason why it works for the mean, but not for the standard deviation? Did I just simply make a mistake when implementing all this (if desired, I can post the Python source code that was used to generate the graphic)? Thanks in advance for all your ideas and suggestions :) bootstrap standard-deviation mean share|improve this question asked Oct 6 '14 at 22:30 der_herr_g 736 add a comment| 1 Answer 1 active