Experimentwise Error Rate Wikipedia
Contents |
may be challenged and removed. (June 2016) (Learn how and when to remove this template message) In statistics, family-wise familywise error rate calculator error rate (FWER) is the probability of making one
Experiment Wise Error
or more false discoveries, or type I errors, among all the hypotheses when performing multiple comparison wise error rate hypotheses tests. Contents 1 History 2 Background 2.1 Classification of multiple hypothesis tests 3 Definition 4 Controlling procedures 4.1 The Bonferroni procedure 4.2 The Šidák family wise error calculator procedure 4.3 Tukey's procedure 4.4 Holm's step-down procedure (1979) 4.5 Hochberg's step-up procedure 4.6 Dunnett's correction 4.7 Scheffé's method 4.8 Resampling procedures 5 Alternative approaches 6 References History[edit] Tukey coined the terms experimentwise error rate and "error rate per-experiment" to indicate error rates that the researcher could use as a
Per Comparison Error Rate
control level in a multiple hypothesis experiment.[citation needed] Background[edit] Within the statistical framework, there are several definitions for the term "family": Hochberg & Tamhane defined "family" in 1987 as "any collection of inferences for which it is meaningful to take into account some combined measure of error".[1][pageneeded] According to Cox in 1982, a set of inferences should be regarded a family:[citation needed] To take into account the selection effect due to data dredging To ensure simultaneous correctness of a set of inferences as to guarantee a correct overall decision To summarize, a family could best be defined by the potential selective inference that is being faced: A family is the smallest set of items of inference in an analysis, interchangeable about their meaning for the goal of research, from which selection of results for action, presentation or highlighting could be made (Yoav Benjamini).[citation needed] Classification of mul
the experimentwise error rate is: where αew https://en.wikipedia.org/wiki/Family-wise_error_rate is experimentwise error rate αpc is the per-comparison error rate, and c is the number of comparisons. For example, if 5 independent comparisons http://davidmlane.com/hyperstat/A43646.html were each to be done at the .05 level, then the probability that at least one of them would result in a Type I error is: 1 - (1 - .05)5 = 0.226. If the comparisons are not independent then the experimentwise error rate is less than . Finally, regardless of whether the comparisons are independent, αew ≤ (c)(αpc) For this example, .226 < (5)(.05) = 0.25.
multiple comparisons problem tests null hypotheses stating that the averages of several disjoint populations are equal to each other (homogeneous). Of great concern to statisticians the problem of multiple http://epicroadtrips.us/2003/summer/nola/nola_offsite/FQ_en.wikipedia.org/en.wikipedia.org/wiki/Multiple_comparisons.html testing, that is, the potential increase in Type I error that occurs when statistical tests are used repeatedly: If n comparisons are performed, the experimentwise significance level α (alpha) is https://en.wikiversity.org/wiki/Analysis_of_variance/Follow-up_tests given by and it increases exponentially as the number of comparisons increases. Thus, in order to retain the same overall rate of false positives in a test involving more wise error than one comparison, the standards for each comparison must be more stringent. Intuitively, reducing the size of the allowable error (alpha) for each comparison by the number of comparisons will result in an overall alpha which does not exceed the desired limit, and this can be mathematically proved true. For instance, to obtain the usual alpha of 0.05 with ten wise error rate comparisons, requiring an alpha of .005 for each comparison can be demonstrated to result in an overall alpha which does not exceed 0.05. However, it can also be demonstrated that this technique is overly conservative, i.e. will in actuality result in a true alpha of significantly less than 0.05; therefore raising the rate of false negatives, failing to identify an unnecessarily high percentage of actual significant differences in the data. This can have important real world consequences; for instance, it may result in failure to approve a drug which is in fact superior to existing drugs, thereby both depriving the world of an improved therapy, and also causing the drug company to lose the substantial investment in research and development up to that point. For this reason, there has been a great deal of attention paid to developing better techniques for multiple comparisons, such that the overall rate of false positives can be maintained without inflating the rate of false negatives unnecessarily. Such methods can be divided into three general categories: methods where total alpha can be pr
4 A statistically more powerful alternative 5 See also Presentation[edit] There are two types of follow-up tests: planned contrasts (when you have hypothesised specific group comparisions) and post hoc tests (when you haven't hypothesised specific differences - tests all pairs of groups). To learn more about conducting follow-up tests for ANOVA, consult: Allen & Bennett (SPSS for the health and behavioural sciences): Chapter 7.3.3 Follow up analyses (One-way ANOVA Example 1) Chapter 7.4.3 Follow up analyses (One-way ANOVA Example 2) Chapter 8.4.2 Follow up analyses (Factor between groups ANOVA Example 2) Chapter 9.4.3 Follow up analyses (One-way repeated measures ANOVA) Howell (Fundamental Statistics): Section 16.5: Multiple comparison procedures (375-383) Howell (Statistical Methods): Chapter 12: Multiple comparison among treatment means (343-389) Francis (Introduction to SPSS for Windows): Section 3.3.6.1: Post hoc tests and planned contrasts (61-63) Francis (Introduction to SPSS for Windows): Section 3.3.8.4: Planned contrasts for within subjects ANOVA (71-71) Planned contrasts[edit] Technically, planned tests (or the use of planned contrasts) are not "follow-up" tests that are done following a 'significant' omnibus F value from an Anova. Planned t-tests can be conducted instead of an Anova (or even notwithstanding a 'non-significant' Anova F value) by virtue of their having been planned prior to collecting the data in that experiment or study. A full complement of planned contrasts will consist of one less than the number of means in the study, and they should all be at least linearly independent of one another (if they aren't mutually orthogonal - a stronger form of linear independence). Two other procedures that are appropriately used with planned contrasts (besides t-tests) are Dunnett's many-one method and Bonferroni's inequality. Regardless of the number of means (say, J) in the study, the J-1 planned contrasts collectively have a df1 (numerator degrees of freedom) value of 1, and these are the ONLY contrasts that can be evaluated for those J means. The important advantage offered by post hoc procedures is that whenever there are more than two means in a study, a potentially infinite number of contrasts can be created and evaluated (so pairwise comparisons typically just scratch the surface of the contrasts that are possible). The method of planned t-tests uses a decision-based