Family Rate Error
Contents |
may be challenged and removed. (June 2016) (Learn how and when to remove this template message) In statistics, family-wise error rate (FWER) is the probability of making one or more false discoveries, or family wise error rates type I errors, among all the hypotheses when performing multiple hypotheses tests. Contents decision wise error rate 1 History 2 Background 2.1 Classification of multiple hypothesis tests 3 Definition 4 Controlling procedures 4.1 The Bonferroni procedure individual error rate 4.2 The Šidák procedure 4.3 Tukey's procedure 4.4 Holm's step-down procedure (1979) 4.5 Hochberg's step-up procedure 4.6 Dunnett's correction 4.7 Scheffé's method 4.8 Resampling procedures 5 Alternative approaches 6 References History[edit] Tukey
Familywise Error Rate
coined the terms experimentwise error rate and "error rate per-experiment" to indicate error rates that the researcher could use as a control level in a multiple hypothesis experiment.[citation needed] Background[edit] Within the statistical framework, there are several definitions for the term "family": Hochberg & Tamhane defined "family" in 1987 as "any collection of inferences for which it is meaningful to take into account some combined rate of error calculation measure of error".[1][pageneeded] According to Cox in 1982, a set of inferences should be regarded a family:[citation needed] To take into account the selection effect due to data dredging To ensure simultaneous correctness of a set of inferences as to guarantee a correct overall decision To summarize, a family could best be defined by the potential selective inference that is being faced: A family is the smallest set of items of inference in an analysis, interchangeable about their meaning for the goal of research, from which selection of results for action, presentation or highlighting could be made (Yoav Benjamini).[citation needed] Classification of multiple hypothesis tests[edit] Main article: Classification of multiple hypothesis tests The following table defines various errors committed when testing multiple null hypotheses. Suppose we have a number m of multiple null hypotheses, denoted by: H1,H2,...,Hm. Using a statistical test, we reject the null hypothesis if the test is declared significant. We do not reject the null hypothesis if the test is non-significant. Summing the test results over Hi will give us the following table and related random variables: Null hypothesis is true (H0) Alternative hypothesis is true (HA) To
specific comparison There are many occasions on which the comparisons among means are more complicated than simply comparing one mean with another. This section shows how to test these more complex comparisons. The methods in this section assume that the comparison among means was decided on before looking at the data. Therefore these comparisons experiment wise error rate are called planned comparisons. A different procedure is necessary for unplanned comparisons. Let's begin with the
Family Wise Error Calculator
made-up data from a hypothetical experiment shown in Table 1. Twelve subjects were selected from a population of high-self-esteem subjects (esteem = 1) and an
Comparison Wise Error Rate
additional 12 subjects were selected from a population of low-self-esteem subjects (esteem = 2). Subjects then performed on a task and (independent of how well they really did) half were told they succeeded (outcome = 1) and the other half were told https://en.wikipedia.org/wiki/Family-wise_error_rate they failed (outcome = 2). Therefore there were six subjects in each esteem/success combination and 24 subjects altogether. After the task, subjects were asked to rate (on a 10-point scale) how much of their outcome (success or failure) they attributed to themselves as opposed to being due to the nature of the task. Table 1. Data from Hypothetical Experiment outcome esteem attrib 1 1 7 1 1 8 1 1 7 1 1 8 1 1 9 1 1 5 1 2 6 http://onlinestatbook.com/chapter10/specific_comparisons.html 1 2 5 1 2 7 1 2 4 1 2 5 1 2 6 2 1 4 2 1 6 2 1 5 2 1 4 2 1 7 2 1 3 2 2 9 2 2 8 2 2 9 2 2 8 2 2 7 2 2 6 The means of the four conditions are shown in Table 2. Table 2. Mean ratings of self-attributions of success or failure. Success High Self Esteem 7.333 Low Self Esteem 5.500 Failure High Self Esteem 4.833 Low Self Esteem 7.833 There are several questions we can ask about the data. We begin by asking whether, on average, subjects who were told they succeeded differed significantly from subjects who were told they failed. The means for subjects in the success condition are 7.333 for the high-self-esteem subjects and 5.500 for the low-self-esteem subjects. Therefore, the mean of all subjects in the success condition is (7.333 + 5.500)/2 = 6.417. Similarly, the mean for all subjects in the failure condition is (4.833 + 7.833)/2 = 6.333. The question is, how do we do a significance test for this difference of 6.417-6.333 = 0.083? The first step is to express this difference in terms of a linear combination of a set of coefficients and the means. This may sound complex, but it is really pretty easy. We can compute the mean of the success conditions by multiplying each success mean by 0.5 and then adding the result. In othe
to win some money. Tell me which of these games you’d rather play: Game 1: We flip a coin once. If it lands on tails, I’ll give you 100 bucks. Game 2: We flip a coin 10 times. If http://blog.minitab.com/blog/statistics-and-quality-data-analysis/multiple-comparisons-beware-of-individual-errors-that-multiply it lands on tails at least one time, I’ll give you 100 bucks. If you said “Doh” and picked Game 2, either you’re Homer Simpson or you already have a good intuitive understanding of an important statistical concept called “the multiple comparisons problem.” You realized that the cumulative probability of getting at least one tail on 10 flips is greater than the probability of getting a tail on just one single flip, wise error even though the probability of getting a tailis constant at 50% for each flip. An Error Rate for the Whole Family With that in mind, think about what happens if you perform a hypothesis test many times on the same set of data. Each hypothesis test has a “built-in” error rate, called alpha, which indicates the probability that the test will find a statistically significant result based on the sample data when, in wise error rate reality,no such difference actually exists. Statisticians call this a Type I error. By convention,alpha is often set at 0.05, which corresponds to a 5% error rate for each test.But if you perform a test multiple times, the cumulative error rate for all those tests together is going to be greater than 5%. How much greater is this cumulative error rate, which statisticians call the experiment-wise orfamily error rate? It depends on how many tests you perform. A Cautionary Tale:Dr. Dredge and His Amazing Expanding Error Suppose a researcher, Dr. Dredge, collects data on the number of hours worked per day by people in different countries. A little green around the gills, statistically, Dr. Dredge decides to use a 2-sample t-test to compare the mean hours worked per day between British with Japanese, then perform the 2-sample t test again to compare Brazilians with Americans, and then use it again to compare French with Australians, and then again and again and again... If each t-test has a Type I error rate (alpha) of 0.05, what would happen? To find out, we can set up a Minitab worksheet to automatically calculate the family error rate based on the number of multiple comparisons: Here’s what you should get: The more comparisons Dr. Dredge makes, the more likely he’s going to