Control Family Wise Error Rate
Contents |
may be challenged and removed. (June 2016) (Learn how and when to remove this template message) In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or type I errors, family- wise error rate at 5 among all the hypotheses when performing multiple hypotheses tests. Contents 1 History 2 family-wise type i error Background 2.1 Classification of multiple hypothesis tests 3 Definition 4 Controlling procedures 4.1 The Bonferroni procedure 4.2 The Šidák procedure 4.3 family wise error rate post hoc Tukey's procedure 4.4 Holm's step-down procedure (1979) 4.5 Hochberg's step-up procedure 4.6 Dunnett's correction 4.7 Scheffé's method 4.8 Resampling procedures 5 Alternative approaches 6 References History[edit] Tukey coined the terms experimentwise error rate
Family Wise Error Rate R
and "error rate per-experiment" to indicate error rates that the researcher could use as a control level in a multiple hypothesis experiment.[citation needed] Background[edit] Within the statistical framework, there are several definitions for the term "family": Hochberg & Tamhane defined "family" in 1987 as "any collection of inferences for which it is meaningful to take into account some combined measure of error".[1][pageneeded] According to Cox in 1982, familywise error rate a set of inferences should be regarded a family:[citation needed] To take into account the selection effect due to data dredging To ensure simultaneous correctness of a set of inferences as to guarantee a correct overall decision To summarize, a family could best be defined by the potential selective inference that is being faced: A family is the smallest set of items of inference in an analysis, interchangeable about their meaning for the goal of research, from which selection of results for action, presentation or highlighting could be made (Yoav Benjamini).[citation needed] Classification of multiple hypothesis tests[edit] Main article: Classification of multiple hypothesis tests The following table defines various errors committed when testing multiple null hypotheses. Suppose we have a number m of multiple null hypotheses, denoted by: H1,H2,...,Hm. Using a statistical test, we reject the null hypothesis if the test is declared significant. We do not reject the null hypothesis if the test is non-significant. Summing the test results over Hi will give us the following table and related random variables: Null hypothesis is true (H0) Alternative hypothesis is true (HA) Total Test is declared significant V {\displaystyle V} S {\displaystyle S} R {\displaystyle R} Test is declared non
describe a number of different ways of testing which means are different Before describing the tests, it is necessary to consider two different ways of thinking about error and how familywise error rate anova they are relevant to doing multiple comparisons Error Rate per Comparison (PC) This is simply
Familywise Error Rate Calculator
the Type I error that we have talked about all along. So far, we have been simply setting its value at .05,
Per Comparison Error Rate
a 5% chance of making an error Familywise Error Rate (FW) Often, after an ANOVA, we want to do a number of comparisons, not just one The collection of comparisons we do is described as https://en.wikipedia.org/wiki/Family-wise_error_rate the "family" The familywise error rate is the probability that at least one of these comparisons will include a type I error Assuming that a ¢ is the per comparison error rate, then: The per comparison error: a = a ¢ but, the familywise error: a = 1 - (1-a ¢ )c Thus, if we do two comparisons, but keep a ¢ at 0.05, the FWerror will really be: a = 1 - http://www.psych.utoronto.ca/courses/c1/chap12/chap12.html (1 - 0.05)2 =1 - (0.95)2 = 1 - 0.9025 = 0.0975 Thus, there is almost a 10% chance of one of the comparisons being significant when we do two comparisons, even when the nulls are true. The basic problem then, is that if we are doing many comparisons, we want to somehow control our familywise error so that we don’t end up concluding that differences are there, when they really are not The various tests we will talk about differ in terms of how they do this They will also be categorized as being either "A priori" or "post hoc" A priori: A priori tests are comparisons that the experimenter clearly intended to test before collecting any data Post hoc: Post hoc tests are comparisons the experimenter has decided to test after collecting the data, looking at the means, and noting which means "seem" different. The probability of making a type I error is smaller for A priori tests because, when doing post hoc tests, you are essentially doing all possible comparisons before deciding which to test in a formal statistical manner Steve: Significant F issue An example for context See page 351 for a very complete description of the Morphine Tolerance study .. Seigel (1975) Highlights: paw lick latency as a measur
or more absolutely true null hypotheses in a family of several absolutely true null hypotheses. Rejecting an absolutely true null hypothesis is known as a "Type One Error." It is important to keep in mind that one http://core.ecu.edu/psyc/wuenschk/docs30/FamilywiseAlpha.htm cannot make a Type I error unless one tests an absolutely true null hypothesis. Accordingly, if absolutely true null hypotheses are unlikely to be encountered, then the unconditional probability of making a Type I error will be quite small. Psychologists and some others act as if they think they will burn in hell for an eternity if they ever make even a single Type I error -- that is, if they ever error rate reject a null hypothesis when, in fact, that hypothesis is absolutely true. I and many others are of the opinion that the unconditional probability of making a Type I error is close to zero, since it is highly unlikely that one will ever test a null hypothesis that is absolutely true. Why worry so much about making an error that is almost impossible to make? There exists a variety of techniques for capping wise error rate familywise alpha at some value, usually .05. Why .05? Maybe .05 is, sometimes, a reasonable criterion for statistical significance when making a single comparison, but is it really reasonable to cap familywise alpha at .05? Even if it is, what reasonably constitutes the family for which one should cap familywise alpha at .05? Is it the family of hypotheses that I am testing for this particular outcome variable in this particular research project? I am testing for all comparisons make in this particular research project? I am testing this month, this year, or during my lifetime? All psychologists are testing this month, this year, or whenever? Many times I have asked this question about what reasonably constitutes a family of comparisons for which alpha should be capped at .05. I have never been satisfied with any answer I have received. Controlling Familywise Alpha When Making Multiple Comparisons Among Means The context in which the term "familywise alpha" is most likely to arise is when making multiple comparisons among means or groups of means. Suppose one has four means and wishes to compare each mean with each other mean. That is six comparisons. If all four means were absolutely equal in the populations of interest, that would be six absolutely true null hypotheses being tested. Those obsessed