Calculate Error Score Variance
Contents |
and error variance Define the standard error of measurement and state why it is valuable State the effect of test length on reliability Distinguish
Calculate Variance From Standard Error
between reliability and validity Define three types of validity State the calculate standard error from variance covariance matrix how reliability determines the upper limit to validity The collection of data involves measurement. Measurement of some
Calculate Variance Standard Deviation
characteristics such as height and weight are relatively straightforward. The measurement of psychological attributes such as self esteem can be complex. A good measurement scale should be both calculate mean standard error reliable and valid. These concepts will be discussed in turn. Reliability The notion of reliability revolves around whether you would get at least approximately the same result if you measure something twice with the same measurement instrument. A common way to define reliability is the correlation between parallel forms of a test. Letting "test" represent a treatment variance parallel form of the test, the symbol rtest,test is used to denote the reliability of the test. True Scores and Error Assume you wish to measure a person's mean response time to the onset of a stimulus. For simplicity, assume that there is no learning over tests which, of course, is not really true. The person is given 1,000 trials on the task and you obtain the response time on each trial. The mean response time over the 1,000 trials can be thought of as the person's "true" score, or at least a very good approximation of it. Theoretically, the true score is the mean that would be approached as the number of trials increases indefinitely. An individual response time can be thought of as being composed of two parts: the true score and the error of measurement. Thus if the person's true score were 345 and their response on one of the trials were 358, then the error of measurement would be 13. Similarly,
than the score the student should actually have received (true score). The difference between the observed score and the true score is called the error score. S true = S observed + S error In the examples to the right Student A has
Is The Variance The Standard Deviation Squared
an observed score of 82. His true score is 88 so the error score would be true score definition 6. Student B has an observed score of 109. His true score is 107 so the error score would be -2. If you could
Standard Error Of Measurement Calculator
add all of the error scores and divide by the number of students, you would have the average amount of error in the test. Unfortunately, the only score we actually have is the Observed score(So). The True score is hypothetical and http://onlinestatbook.com/lms/research_design/measurement.html could only be estimated by having the person take the test multiple times and take an average of the scores, i.e., out of 100 times the score was within this range. This is not a practical way of estimating the amount of error in the test. True Scores / Estimating Errors / Confidence Interval / Top Estimating Errors Another way of estimating the amount of error in a test is to use other estimates of error. One of these is the Standard http://home.apu.edu/~bsimmerok/WebTMIPs/Session6/TSes6.html Deviation. The larger the standard deviation the more variation there is in the scores. The smaller the standard deviation the closer the scores are grouped around the mean and the less variation. Another estimate is the reliability of the test. The reliability coefficient (r) indicates the amount of consistency in the test. If you subtract the r from 1.00, you would have the amount of inconsistency. In the diagram at the right the test would have a reliability of .88. This would be the amount of consistency in the test and therefore .12 amount of inconsistency or error. Using the formula: {SEM = So x Sqroot(1-r)} where So is the Observed Standard Deviation and r is the Reliability the result is the Standard Error of Measurement(SEM). This gives an estimate of the amount of error in the test from statistics that are readily available from any test. The relationship between these statistics can be seen at the right. In the first row there is a low Standard Deviation (SDo) and good reliability (.79). In the second row the SDo is larger and the result is a higher SEM at 1.18. In the last row the reliability is very low and the SEM is larger. As the SDo gets larger the SEM gets larger. As the r gets smaller the SEM gets larger. SEM SDo Reliability .72 1.58 .79 1.18 3.58 .89 2.79 3.58 .39 True Scores / Estimating Errors / Confidence In
Part I I. Overview II. Scale Development Issues A. The Domain Sampling Model B. Content Validity C. The domain sampling model and the interpretation of test scores D. Face Validity III. How Reliable is the http://www.uccs.edu/lbecker/relval_i.html Scale? A. Theory of Measurement Error B. Reliability Estimates C. Reliability http://www.sportsci.org/resource/stats/relycalc.html Standards D. Standard error of measurement E. Regression Towards the Mean F. Interrater reliability IV. Diagnostic Utility Reliability and Validity, Part II References Footnotes I. Overview The goal of this set of notes is explore issues of reliability and validity as they apply to psychological measurement. The approach will standard error be to look these issues by examining a particular scale, the PTSD-Interview (PTSD-I: Watson, Juba, Manifold, Kucala, & Anderson, 1991). The issues to be discussed include: (a) How would you go about developing a scale to measure posttraumatic stress disorder? (b) What items would you include in your scale and how would you determine the content validity of the scale? (c) How would you determine the calculate error score reliability of the scale? (d) How would you determine the validity of the scale? This web page will focus on the the first three issues. A companion web page will look at the validity question. II. Scale Development Issues A. The Domain Sampling Model The first two questions posed in the overview, "How would you go about developing a scale to measure posttraumatic stress disorder?" and "What items would you include in your scale and how would you determine the content validity of the scale?" can be answered by first defining the domain of possible items that are relevant for the scale. In the case of PTSD, the relevant domain of items are specified by the DSM-IV diagnostic criteria for PTSD. See handout: DSM-IV Diagnostic Criteria for PTSD How would you specify the domain for content quizzes in general psychology, for a personality test of extraversion? B. Content Validity Content validity asks the question, "Do the items on the scale adequately sample the domain of interest?" If you are developing a test to diagnose PTSD then the test must adequately reflect all of the DSM diagnostic criteria. See handout: The PTSD-I Scale Do the items on the P
this page. I explain here how to analyze data for two trials using simple but effective methods. To combine three or more trials you need more sophisticated procedures, such as analysis of variance or modeling variances. I go into heaps of detail about checking for non-uniform error in your data, and I have a few words on biased estimates of reliability. Finally, you can download a spreadsheet for calculating reliability between consecutive pairs of trials, complete with raw and percent estimates and confidence limits for typical error, change in mean, and retest correlation. The spreadsheet has data adapted from real measurements of skinfold thickness of athletes. Two Trials Analyzing two trials is straightforward. All the necessary calculations are included in the spreadsheet for reliability. When you have three or more trials, I strongly recommend that you first do separate analyses for consecutive pairs of trials (Trial1 with Trial2, Trial2 with Trial3, Trial3 with Trial4, etc.). That way you will see if there are any substantial differences in the typical error or change in the mean between pairs of trials. Such differences are indicative of learning or practice effects. If there is no substantial change in the typical error between three or more consecutive trials, analyze those trials all together to get greater precision for your estimates of reliability. Typical Error The values of the change score or difference score for each subject yield the typical error. Simply divide the standard deviation of the difference score by root2. For example, if the difference scores are 5, -2, 6, 0, and -3, the standard deviation of these scores is 4.1, so the typical error is 4.1/root2 = 2.9. This method for calculating the typical error follows from the fact that the variance of the difference score (s2diff) is equal to the sum of the variances representing the typical error (s) in each trial: s2diff = s2 + s2, so s = sdiff/root2. To derive this within-subject variation as a coefficient of variation (CV), log-transform your variable, then do the same calculations as above. The CV is derived from the typical error (s) of the log-transformed variable via the following formula: CV = 100(es - 1), which simplifies to 100s for s<0.05 (that is, CVs of less than 5%). You will also meet this formula on the page about log-transformation, where I describe ho