If Error Bars Overlap
Contents |
in a publication or presentation, you may be tempted to draw conclusions about the statistical significance of differences between group means by looking at whether the error bars overlap. Let's look at two contrasting examples. What can you conclude when standard error bars do not overlap? When how to interpret error bars standard error (SE) bars do not overlap, you cannot be sure that the difference between large error bars two means is statistically significant. Even though the error bars do not overlap in experiment 1, the difference is not statistically significant (P=0.09 by sem error bars unpaired t test). This is also true when you compare proportions with a chi-square test. What can you conclude when standard error bars do overlap? No surprises here. When SE bars overlap, (as in experiment 2) you can be what are error bars in excel sure the difference between the two means is not statistically significant (P>0.05). What if you are comparing more than two groups? Post tests following one-way ANOVA account for multiple comparisons, so they yield higher P values than t tests comparing just two groups. So the same rules apply. If two SE error bars overlap, you can be sure that a post test comparing those two groups will find no statistical significance. However if two SE error bars do not
What Do Small Error Bars Mean
overlap, you can't tell whether a post test will, or will not, find a statistically significant difference. What if the error bars do not represent the SEM? Error bars that represent the 95% confidence interval (CI) of a mean are wider than SE error bars -- about twice as wide with large sample sizes and even wider with small sample sizes. If 95% CI error bars do not overlap, you can be sure the difference is statistically significant (P < 0.05). However, the converse is not true--you may or may not have statistical significance when the 95% confidence intervals overlap. Some graphs and tables show the mean with the standard deviation (SD) rather than the SEM. The SD quantifies variability, but does not account for sample size. To assess statistical significance, you must take into account sample size as well as variability. Therefore, observing whether SD error bars overlap or not tells you nothing about whether the difference is, or is not, statistically significant. What if the groups were matched and analyzed with a paired t test? All the comments above assume you are performing an unpaired t test. When you analyze matched data with a paired t test, it doesn't matter how much scatter each group has -- what matters is the consistency of the changes or differences. Whether or not the error bars for each group overlap tells you nothing about theP value
in a publication or presentation, you may be tempted to draw conclusions about the statistical significance of differences between group means by looking at whether the error bars overlap. Let's look at two contrasting examples. What can you conclude when standard error bars do
How To Calculate Error Bars
not overlap? When standard error (SE) bars do not overlap, you cannot be sure that the error bars standard deviation or standard error difference between two means is statistically significant. Even though the error bars do not overlap in experiment 1, the difference is not statistically how to draw error bars significant (P=0.09 by unpaired t test). This is also true when you compare proportions with a chi-square test. What can you conclude when standard error bars do overlap? No surprises here. When SE bars overlap, (as in experiment https://egret.psychol.cam.ac.uk/statistics/local_copies_of_sources_Cardinal_and_Aitken_ANOVA/errorbars.htm 2) you can be sure the difference between the two means is not statistically significant (P>0.05). What if you are comparing more than two groups? Post tests following one-way ANOVA account for multiple comparisons, so they yield higher P values than t tests comparing just two groups. So the same rules apply. If two SE error bars overlap, you can be sure that a post test comparing those two groups will find no statistical significance. However https://egret.psychol.cam.ac.uk/statistics/local_copies_of_sources_Cardinal_and_Aitken_ANOVA/errorbars.htm if two SE error bars do not overlap, you can't tell whether a post test will, or will not, find a statistically significant difference. What if the error bars do not represent the SEM? Error bars that represent the 95% confidence interval (CI) of a mean are wider than SE error bars -- about twice as wide with large sample sizes and even wider with small sample sizes. If 95% CI error bars do not overlap, you can be sure the difference is statistically significant (P < 0.05). However, the converse is not true--you may or may not have statistical significance when the 95% confidence intervals overlap. Some graphs and tables show the mean with the standard deviation (SD) rather than the SEM. The SD quantifies variability, but does not account for sample size. To assess statistical significance, you must take into account sample size as well as variability. Therefore, observing whether SD error bars overlap or not tells you nothing about whether the difference is, or is not, statistically significant. What if the groups were matched and analyzed with a paired t test? All the comments above assume you are performing an unpaired t test. When you analyze matched data with a paired t test, it doesn't matter how much scatter each group has -- what matters is the consistency of the changes or differences. Whether or
MenuMenu Home Current issue Comment Research Archive Archive by issue Archive by category Specials, focuses & supplements Authors & referees Guide to authors For referees Submit manuscript Reporting http://www.nature.com/nmeth/journal/v10/n10/full/nmeth.2659.html checklist About the journal About Nature Methods About the editors Press releases Contact the journal Subscribe For advertisers For librarians Methagora blog Home archive issue This Month full text Nature Methods | http://rpsychologist.com/how-to-tell-when-error-bars-correspond-to-a-significant-p-value This Month Print Share/bookmark Cite U Like Facebook Twitter Delicious Digg Google+ LinkedIn Reddit StumbleUpon Previous article Nature Methods | This Month The Author File: Jeff Dangl Next article Nature Methods | Correspondence error bars ExpressionBlast: mining large, unstructured expression databases Points of Significance: Error bars Martin Krzywinski1, Naomi Altman2, Affiliations Journal name: Nature Methods Volume: 10, Pages: 921–922 Year published: (2013) DOI: doi:10.1038/nmeth.2659 Published online 27 September 2013 Article tools PDF PDF Download as PDF (269 KB) View interactive PDF in ReadCube Citation Reprints Rights & permissions Article metrics The meaning of error bars is often misinterpreted, as is if error bars the statistical significance of their overlap. Subject terms: Publishing• Research data• Statistical methods At a glance Figures View all figures Figure 1: Error bar width and interpretation of spacing depends on the error bar type. (a,b) Example graphs are based on sample means of 0 and 1 (n = 10). (a) When bars are scaled to the same size and abut, P values span a wide range. When s.e.m. bars touch, P is large (P = 0.17). (b) Bar size and relative position vary greatly at the conventional P value significance cutoff of 0.05, at which bars may overlap or have a gap. Full size image View in article Figure 2: The size and position of confidence intervals depend on the sample. On average, CI% of intervals are expected to span the mean—about 19 in 20 times for 95% CI. (a) Means and 95% CIs of 20 samples (n = 10) drawn from a normal population with mean m and s.d. σ. By chance, two of the intervals (red) do not capture the mean. (b) Relationship between s.e.m. and 95% CI error bars with increasing n. Full size image View in article Fig
statistics Share on: Introduction Belia, Fidler, Williams, and Cumming (2005) found that researchers in psychology, behavior neuroscience and medicine are really bad at interpreting when error bars signify that two means are significantly different (p = 0.05). What they did was to email a bunch of researchers and invite them to take a web-based test, and they got 473 usable responses. The test consisted of an interactive plot with error bars for two independent groups, the participants were asked to move the error bars to a position they believed would represent a significant t-test at p=0.05. They did this for error bars based on the 95 % CI and the group’s standard errors. The participants did on average set the 95 % CI too far apart with their mean placement corresponding to a p value of .009. They did the opposite with the SE error bars, which they put too close together yielding placements corresponding to p = 0.109. And if you’re wondering they found no difference between the three disciplines. Plots I wanted to pull my weight, and I have therefore created some various plots in R that show error bars that are significant at various p-values. Figure 1. Error bars corresponding to a significant difference at p = .05 (equal group sizes and equal variances) Figure 2. Error bars corresponding to a significant difference at p = .01 (equal group sizes and equal variances) Figure 3. Error bars corresponding to a significant difference at p = .001 (equal group sizes and equal variances) Based on the first plot we see that an overlap of about one third of the 95 % CIs corresponds to p = 0.05. For the SE error bars we see that they are about 1 SE apart when p = 0.05. R Code Here's the complete R code used to produce these plots 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47library(ggplot2) library(ggplot2) library(plyr) m2 <- 100 # initital group size, should be the same as m1 p <- 1 # starting p-value m1 <- 100 # mean group 1 sd1 <- 10 # sd group 1 sd2 <- 10 # sd group 2 n <- 20 # n per group s <- sqrt(0.5 * (sd1^2 + sd2^2)) # pooled sd while(p>0.05) { # loop til p = 0.05 t <- (min(c(m1,m2)) - max(c(m1,m2))) / (s * sqrt(2/n)) # t statistics df <- (n*2)-2 # degress of freedom p <-pt(t, df)*2 # p value m2 <- m2 - (m2/10000) # adjust mean for group 2 } get_CI <- function(x, sd, CI) { # calculate error bars se <- sd/sqrt(n) # standard error lwr <- c(x - qt((1 + CI)/2