Cluster Robust Standard Error Stata
Contents |
on statistics Stata Journal Stata Press Stat/Transfer Gift Shop Purchase Order Stata Request a quote Purchasing FAQs Bookstore Stata Press books Books on Stata Books on statistics Stat/Transfer Stata Journal Gift Shop Training robust standard errors spss NetCourses Classroom and web On-site Video tutorials Third-party courses Support Updates Documentation Installation Guide robust standard errors sas FAQs Register Stata Technical services Policy Contact Publications Bookstore Stata Journal Stata News Conferences and meetings Stata Conference Upcoming meetings Proceedings Email robust standard errors r alerts Statalist The Stata Blog Web resources Author Support Program Installation Qualification Tool Disciplines Company StataCorp Contact us Hours of operation Announcements Customer service Register Stata online Change registration Change address Subscribe to Stata News when to use clustered standard errors Subscribe to email alerts International resellers Careers Our sites Statalist The Stata Blog Stata Press Stata Journal Advanced search Site index Purchase Products Training Support Company >> Home >> Resources & support >> FAQs >> Comparison of standard errors for robust, cluster, and standard estimators How can the standard errors with the vce(cluster clustvar) option be smaller than those without the vce(cluster clustvar) option? Title Comparison
Huber White Standard Errors
of standard errors for robust, cluster, and standard estimators Author William Sribney, StataCorp Question: I ran a regression with data for clients clustered by therapist. I first estimated the regression without using the vce(cluster clustvar) option, then I re-ran it using the vce(cluster clustvar) option. In many cases, the standard errors were much smaller when I used the vce(cluster clustvar) option. Does this seem reasonable? Answer The short answer is that this can happen when the intracluster correlations are negative. Let me back up and explain the mechanics of what can happen to the standard errors. Let’s consider the following three estimators available with the regress command: the ordinary least squares (OLS) estimator, the robust estimator obtained when the vce(robust) option is specified (without the vce(cluster clustvar) option), and the robust cluster estimator obtained when the vce(cluster clustvar) option is specified. Comparing the three variance estimators: OLS, robust, and robust cluster The formulas for the estimators are OLS variance estimator: VOLS = s2 * (X'X)-1 where N s2 = (1/(N - k)) Σ ei2 i=1 Robust (unclustered) variance estimator: N Vrob = (X'X)-1 * [ Σ (ei*xi)' * (ei*xi) ] * (X'X)-1 i=1 Robust cluster variance estimator: nc Vcluster = (X'X)-1 * Σ uj'*uj * (X'X)-1 j=1 where uj =
on statistics Stata Journal Stata Press Stat/Transfer Gift Shop Purchase Order Stata Request a quote Purchasing FAQs
Stata Cluster
Bookstore Stata Press books Books on Stata Books on statistics Stat/Transfer huber white standard errors stata Stata Journal Gift Shop Training NetCourses Classroom and web On-site Video tutorials Third-party courses Support Updates Documentation stata robust standard errors to heteroskedasticity Installation Guide FAQs Register Stata Technical services Policy Contact Publications Bookstore Stata Journal Stata News Conferences and meetings Stata Conference Upcoming meetings Proceedings Email alerts Statalist The http://www.stata.com/support/faqs/statistics/standard-errors-and-vce-cluster-option/ Stata Blog Web resources Author Support Program Installation Qualification Tool Disciplines Company StataCorp Contact us Hours of operation Announcements Customer service Register Stata online Change registration Change address Subscribe to Stata News Subscribe to email alerts International resellers Careers Our sites Statalist The Stata Blog Stata Press Stata Journal Advanced search Site http://www.stata.com/support/faqs/statistics/robust-standard-errors/ index Purchase Products Training Support Company >> Home >> Resources & support >> FAQs >> Estimating robust standard errors in Stata Note: This FAQ is for users of releases prior to Stata 6. It is not relevant for more recent versions. Why don’t the old huber results match the new robust versions? Title Estimating robust standard errors in Stata Author James Hardin, StataCorp The new versions are better (less biased). In the new implementation of the robust estimate of variance, Stata is now scaling the estimated variance matrix in order to make it less biased. Unclustered data Estimating robust standard errors in Stata 4.0 resulted in . hreg price weight displ Regression with Huber standard errors Number of obs = 74 R-squared = 0.2909 Adj R-squared = 0.2710 Root MSE = 2518.38 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- weight | 1.823366 .7648832 2.384 0.020 .2982323 3.3485 displ | 2.087054 7.284658 0.286 0.775 -12.43814 16.
Analyzing Correlated Data Correlated data are fairly common in social science research. Husbands' responses to marital satisfaction http://www.ats.ucla.edu/stat/stata/library/cpsu.htm questions are related to(correlated with) wives' responses. Parents' assessment of their child's achievement is correlated with the child's assessment of his or her achievement. Members of the same household are likely to be more similar on a wide variety of measures than to nonmembers. Sometimes the correlated nature of the data is obvious and standard error is considered as the data are being collected. Other times, the correlated nature is less obvious and was not considered as the data were collected. Either way, to correctly analyze the data, the correlation needs to be taken into account. If it is not, the standard errors of the estimates will be off (usually underestimated), robust standard error rendering significance tests invalid. This happens because the standard errors that are normally reported with an analysis assume that each observation is independent of all other observations in the data set. To the extent that this is not true (i.e., as the correlation becomes larger), each observation contain less unique information. (Another consequence of this is that the effective sample size is diminished.) This kind of correlation (between observations) is called an intraclass correlation. It is different from a Pearson correlation, which is between two variables. So, how bad could ignoring the intraclass correlation be? We have reproduced Table 1.1 from Introduction to Multilevel Modeling by Ita Kreft and Jan de Leeuw which shows what happens to the alpha values when you thought that they were 0.05. The rows of the table show different values of N, the number of subjects in the experiment or survey. The columns show different values of rho, the intraclass correlation coefficient. As you can see
be down. Please try the request again. Your cache administrator is webmaster. Generated Wed, 05 Oct 2016 20:37:09 GMT by s_hv997 (squid/3.5.20)