Multiple Standard Error Of Estimate Formula
Contents |
is used to predict a single dependent variable (Y). The predicted value of Y is a linear transformation of the X variables such that the sum of squared deviations of the observed and predicted Y is a minimum. The computations are more complex, however, because the interrelationships among all the variables must be taken into account in the weights assigned to the variables. The interpretation of the results of a multiple regression analysis is
Multiple Regression Example Problems
also more complex for the same reason. With two independent variables the prediction of Y is expressed by the following equation: multiple regression equation example Y'i = b0 + b1X1i + b2X2i Note that this transformation is similar to the linear transformation of two variables discussed in the previous chapter except that the w's have been
Standard Error Multiple Regression
replaced with b's and the X'i has been replaced with a Y'i. The "b" values are called regression weights and are computed in a way that minimizes the sum of squared deviations in the same manner as in simple linear regression. The difference is that in simple linear regression only two multiple regression equation with 3 variables weights, the intercept (b0) and slope (b1), were estimated, while in this case, three weights (b0, b1, and b2) are estimated. EXAMPLE DATA The data used to illustrate the inner workings of multiple regression will be generated from the "Example Student." The data are presented below: Homework Assignment 21 Example Student PSY645 Dr. Stockburger Due Date
Y1 Y2 X1 X2 X3 X4 125 113 13 18 25 11 158 115 39 18 59 30 207 126 52 50 62 53 multiple regression example in excel 182 119 29 43 50 29 196 107 50 37 65 56 175 135 64 19 79 49 145 111 11 27 17 14 144 130 22 23 31 17 160 122 30 18 34 22 175 114 51 11 58 40 151 121 27 15 29 31 161 105 41 22 53 39 200 131 51 52 75 36 173 123 37 36 44 27 175 121 23 48 27 20 162 120 43 15 65 36 155 109 38 19 62 37 230 130 62 56 75 50 162 134 28 30 36 20 153 124 30 25 41 33 The example data can be obtained as a text file and as an SPSS/WIN file from this web page. If a student desires a more concrete description of this data file, meaning could be given the variables as follows: Y1 - A measure of success in graduate school. X1 - A measure of intellectual ability. X2 - A measure of "work ethic." X3 - A second measure of intellectual ability. X4 - A measure of spatial ability. Y2 - Score on a major review paper. UNIVARIATE ANALYSIS The first step in the analysis of multivariate data is a table of means and standard deviations. Additional analysis recommendations inthe estimate from a scatter plot Compute the standard error of the estimate based on errors of prediction Compute the standard error using Pearson's correlation Estimate the standard error of the estimate based on a sample Figure 1 shows two regression examples. You can see that
Multiple Correlation Coefficient Formula
in Graph A, the points are closer to the line than they are in Graph B. how to calculate multiple regression by hand Therefore, the predictions in Graph A are more accurate than in Graph B. Figure 1. Regressions differing in accuracy of prediction. The standard
Multiple Correlation Coefficient In R
error of the estimate is a measure of the accuracy of predictions. Recall that the regression line is the line that minimizes the sum of squared deviations of prediction (also called the sum of squares error). The standard error http://www.psychstat.missouristate.edu/multibook/mlt06m.html of the estimate is closely related to this quantity and is defined below: where σest is the standard error of the estimate, Y is an actual score, Y' is a predicted score, and N is the number of pairs of scores. The numerator is the sum of squared differences between the actual scores and the predicted scores. Note the similarity of the formula for σest to the formula for σ.  It turns out that σest is the http://onlinestatbook.com/lms/regression/accuracy.html standard deviation of the errors of prediction (each Y - Y' is an error of prediction). Assume the data in Table 1 are the data from a population of five X, Y pairs. Table 1. Example data. X Y Y' Y-Y' (Y-Y')2 1.00 1.00 1.210 -0.210 0.044 2.00 2.00 1.635 0.365 0.133 3.00 1.30 2.060 -0.760 0.578 4.00 3.75 2.485 1.265 1.600 5.00 2.25 2.910 -0.660 0.436 Sum 15.00 10.30 10.30 0.000 2.791 The last column shows that the sum of the squared errors of prediction is 2.791. Therefore, the standard error of the estimate is There is a version of the formula for the standard error in terms of Pearson's correlation: where ρ is the population value of Pearson's correlation and SSY is For the data in Table 1, μy = 2.06, SSY = 4.597 and ρ= 0.6268. Therefore, which is the same value computed previously. Similar formulas are used when the standard error of the estimate is computed from a sample rather than a population. The only difference is that the denominator is N-2 rather than N. The reason N-2 is used rather than N-1 is that two parameters (the slope and the intercept) were estimated in order to estimate the sum of squares. Formulas for a sample comparable to the ones for a population are shown below. Please answer the questions: feedback
the ANOVA table (often this is skipped). Interpreting the regression coefficients table. Confidence intervals for the slope parameters. Testing for statistical significance of coefficients Testing hypothesis on a slope parameter. Testing overall significance of the regressors. Predicting http://cameron.econ.ucdavis.edu/excel/ex61multipleregression.html y given values of regressors. Excel limitations. There is little extra to know beyond regression with one explanatory variable. The main addition is the F-test for overall fit. MULTIPLE REGRESSION USING THE DATA ANALYSIS ADD-IN This requires the Data Analysis Add-in: see Excel 2007: Access and Activating the Data Analysis Add-in The data used are in carsdata.xls We then create a new variable in cells C2:C6, cubed household size as multiple regression a regressor. Then in cell C1 give the the heading CUBED HH SIZE. (It turns out that for the se data squared HH SIZE has a coefficient of exactly 0.0 the cube is used). The spreadsheet cells A1:C6 should look like: We have regression with an intercept and the regressors HH SIZE and CUBED HH SIZE The population regression model is: y = β1 + β2 x2 + β3 x3 multiple regression example + u It is assumed that the error u is independent with constant variance (homoskedastic) - see EXCEL LIMITATIONS at the bottom. We wish to estimate the regression line: y = b1 + b2 x2 + b3 x3 We do this using the Data analysis Add-in and Regression. The only change over one-variable regression is to include more than one column in the Input X Range. Note, however, that the regressors need to be in contiguous columns (here columns B and C). If this is not the case in the original data, then columns need to be copied to get the regressors in contiguous columns. Hitting OK we obtain The regression output has three components: Regression statistics table ANOVA table Regression coefficients table. INTERPRET REGRESSION STATISTICS TABLE This is the following output. Of greatest interest is R Square. Explanation Multiple R 0.895828 R = square root of R2 R Square 0.802508 R2 Adjusted R Square 0.605016 Adjusted R2 used if more than one x variable Standard Error 0.444401 This is the sample estimate of the standard deviation of the error u Observations 5 Number of observations used in the regression (n) The above gives the overall goodness-of-fit measures: R2 = 0.8025 Correlation between y and y-hat is 0.89