Comparison Wise Error
Contents |
the simple question posed by an analysis of variance - do at least two treatment means differ? It may be that embedded in a group of treatments there is only one "control" treatment to which every other treatment should be compared, experiment wise error and comparisons among the non-control treatments may be uninteresting. One may also, after performing an analysis
Comparisonwise Error
of variance and rejecting the null hypothesis of equality of treatment means want to know exactly which treatments or groups of treatments differ. To
Experiment Wise Error Anova
answer these kinds of questions requires careful consideration of the hypotheses of interest both before and after an experiment is conducted, the Type I error rate selected for each hypothesis, the power of each hypothesis test, and the Type
Experiment Wise Error Definition
I error rate acceptable for the group of hypotheses as a whole. Comparisons or Contrasts If we let represent a treatment mean and ci a weight associated with the ith treatment mean then a comparison or contrast can be represented as: , where It can be seen that this contrast is a linear combination of treatment means (other contrasts such as quadratic and cubic are also possible). All of the following are possible comparisons: experiment wise error rate because they are weighted linear combinations of treatment means and the weights sum to zero. For example, previously we have performed comparisons between two treatment means using the t - statistic: with (n1 + n2) - 2 degrees of freedom. This statistic is a "contrast." The numerator of this expression follows the general form of the contrast outlined above with the weights c1 and c2 equal to 1 and -1, respectively: However, we also see that this contrast is divided by the pooled within cell or within group variation. So, a contrast is actually the ratio of a linear combination of weighted means to an estimate of the pooled within cell or error variation in the experiment: with degrees of freedom. For a non - directional null hypothesis t could be replaced by F: with 1, and degrees of freedom. In general, a contrast is the ratio of a linear combination of weighted means to the mean square within cells times the sum of the squares of the weights assigned to each mean divided by the sample size within cells: where the cI' s are the weights assigned to each treatment mean,, ni is the number of observations in each cell and MSerror is the within cell variation pooled from the entire experiment (the within cell mean squa
the experimentwise error rate is: where αew http://online.sfsu.edu/efc/classes/biol458/multcomp/multcomp.htm is experimentwise error rate αpc is the per-comparison error rate, and c is the number of comparisons. For example, if 5 independent comparisons http://davidmlane.com/hyperstat/A43646.html were each to be done at the .05 level, then the probability that at least one of them would result in a Type I error is: 1 - (1 - .05)5 = 0.226. If the comparisons are not independent then the experimentwise error rate is less than . Finally, regardless of whether the comparisons are independent, αew ≤ (c)(αpc) For this example, .226 < (5)(.05) = 0.25.
may be challenged and removed. (June 2016) (Learn how and when to remove this template message) An example of data produced by data dredging, apparently showing a close link between the letters in the winning https://en.wikipedia.org/wiki/Multiple_comparisons_problem word used in a spelling bee competition and the number of people in the United States killed by venomous spiders. The clear similarity in trends is a coincidence. If many data series http://onlinestatbook.com/chapter10/specific_comparisons.html are compared, similarly convincing but coincidental data may be obtained. In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously[1] or infers a wise error subset of parameters selected based on the observed values.[2] It is also known as the look-elsewhere effect. Errors in inference, including confidence intervals that fail to include their corresponding population parameters or hypothesis tests that incorrectly reject the null hypothesis, are more likely to occur when one considers the set as a whole. Several statistical techniques have been developed to prevent this from happening, allowing significance experiment wise error levels for single and multiple comparisons to be directly compared. These techniques generally require a higher significance threshold for individual comparisons, so as to compensate for the number of inferences being made. Contents 1 History 2 Definition 2.1 Classification of multiple hypothesis tests 3 Example 4 Controlling procedures 5 Post-hoc testing of ANOVAs 6 Large-scale multiple testing 6.1 Assessing whether any alternative hypotheses are true 7 See also 8 References 9 Further reading History[edit] The interest in the problem of multiple comparisons began in the 1950s with the work of Tukey and Scheffé. New methods and procedures came out: the closed testing procedure (Marcus et al., 1976) and the Holm–Bonferroni method (1979). Later, in the 1980s, the issue of multiple comparisons came back (Hochberg and Tamhane (1987), Westfall and Young (1993), and Hsu (1996)). In 1995 work on the false discovery rate and other new ideas began. In 1996 the first conference on multiple comparisons took place in Israel. This was followed by conferences around the world: Berlin (2000), Bethesda (2002), Shanghai (2005), Vienna (2007), and Tokyo (2009). All these reflect increased interest in multiple comparisons.[3] Definition[edit] This section does not cite any sources. Ple
specific comparison There are many occasions on which the comparisons among means are more complicated than simply comparing one mean with another. This section shows how to test these more complex comparisons. The methods in this section assume that the comparison among means was decided on before looking at the data. Therefore these comparisons are called planned comparisons. A different procedure is necessary for unplanned comparisons. Let's begin with the made-up data from a hypothetical experiment shown in Table 1. Twelve subjects were selected from a population of high-self-esteem subjects (esteem = 1) and an additional 12 subjects were selected from a population of low-self-esteem subjects (esteem = 2). Subjects then performed on a task and (independent of how well they really did) half were told they succeeded (outcome = 1) and the other half were told they failed (outcome = 2). Therefore there were six subjects in each esteem/success combination and 24 subjects altogether. After the task, subjects were asked to rate (on a 10-point scale) how much of their outcome (success or failure) they attributed to themselves as opposed to being due to the nature of the task. Table 1. Data from Hypothetical Experiment outcome esteem attrib 1 1 7 1 1 8 1 1 7 1 1 8 1 1 9 1 1 5 1 2 6 1 2 5 1 2 7 1 2 4 1 2 5 1 2 6 2 1 4 2 1 6 2 1 5 2 1 4 2 1 7 2 1 3 2 2 9 2 2 8 2 2 9 2 2 8 2 2 7 2 2 6 The means of the four conditions are shown in Table 2. Table 2. Mean ratings of self-attributions of success or failure. Success High Self Esteem 7.333 Low Self Esteem 5.500 Failure High Self Esteem 4.833 Low Self Esteem 7.833 There are several questions we can ask about the data. We begin by asking whether, on average, subjects who were told they succeeded differed significantly from subjects who were told they failed. The means for subjects in the success condition are 7.333 for the high-self-esteem subjects and 5.500 for the low-self-esteem subjects. Therefore, the mean of all subjects in the success condition is (7.333 + 5.500)/2 = 6.417. Similarly, the mean for all subjects in the failure condition is (4.833 + 7.833)/2 = 6.333. The question is, how do we do a significance test for this difference of 6.417-6.333 = 0.083? The first step is to express this difference in terms of a linear combination of a set of coefficients and the means. This may sound complex, but it is really pretty easy. We can compute the mean of the success conditions by multiplying each success mean by 0.5 and then adding the result. In other words, we compute (.5)(7.333) + (.5)(5.