Cluster Standard Error
Contents |
on statistics Stata Journal Stata Press Stat/Transfer Gift Shop Purchase Order Stata Request a quote Purchasing FAQs Bookstore Stata Press books Books on Stata Books on statistics Stat/Transfer Stata Journal Gift Shop Training NetCourses Classroom cluster standard error r and web On-site Video tutorials Third-party courses Support Updates Documentation Installation Guide FAQs Register cluster standard error stata Stata Technical services Policy Contact Publications Bookstore Stata Journal Stata News Conferences and meetings Stata Conference Upcoming meetings Proceedings Email alerts Statalist
Cluster Standard Error Sas
The Stata Blog Web resources Author Support Program Installation Qualification Tool Disciplines Company StataCorp Contact us Hours of operation Announcements Customer service Register Stata online Change registration Change address Subscribe to Stata News Subscribe to email
Robust Standard Error
alerts International resellers Careers Our sites Statalist The Stata Blog Stata Press Stata Journal Advanced search Site index Purchase Products Training Support Company >> Home >> Resources & support >> FAQs >> Comparison of standard errors for robust, cluster, and standard estimators How can the standard errors with the vce(cluster clustvar) option be smaller than those without the vce(cluster clustvar) option? Title Comparison of standard errors for robust, standard error cluster sampling cluster, and standard estimators Author William Sribney, StataCorp Question: I ran a regression with data for clients clustered by therapist. I first estimated the regression without using the vce(cluster clustvar) option, then I re-ran it using the vce(cluster clustvar) option. In many cases, the standard errors were much smaller when I used the vce(cluster clustvar) option. Does this seem reasonable? Answer The short answer is that this can happen when the intracluster correlations are negative. Let me back up and explain the mechanics of what can happen to the standard errors. Let’s consider the following three estimators available with the regress command: the ordinary least squares (OLS) estimator, the robust estimator obtained when the vce(robust) option is specified (without the vce(cluster clustvar) option), and the robust cluster estimator obtained when the vce(cluster clustvar) option is specified. Comparing the three variance estimators: OLS, robust, and robust cluster The formulas for the estimators are OLS variance estimator: VOLS = s2 * (X'X)-1 where N s2 = (1/(N - k)) Σ ei2 i=1 Robust (unclustered) variance estimator: N Vrob = (X'X)-1 * [ Σ (ei*xi)' * (ei*xi) ] * (X'X)-1 i=1 Robust cluster variance estimator: nc Vcluster = (X'X)-1 * Σ uj'*uj * (X'X)-1 j=1 where uj = Σ ei*xi jcluster and nc is the total number o
Analyzing Correlated Data Correlated data are fairly common in social science research. Husbands' responses to marital satisfaction questions are related to(correlated with) wives' responses. Parents' assessment of their child's achievement is correlated with the child's assessment
Cluster Standard Errors Logit Regression
of his or her achievement. Members of the same household are likely to be cluster standard errors wiki more similar on a wide variety of measures than to nonmembers. Sometimes the correlated nature of the data is obvious and is cluster standard errors panel data considered as the data are being collected. Other times, the correlated nature is less obvious and was not considered as the data were collected. Either way, to correctly analyze the data, the correlation needs to be http://www.stata.com/support/faqs/statistics/standard-errors-and-vce-cluster-option/ taken into account. If it is not, the standard errors of the estimates will be off (usually underestimated), rendering significance tests invalid. This happens because the standard errors that are normally reported with an analysis assume that each observation is independent of all other observations in the data set. To the extent that this is not true (i.e., as the correlation becomes larger), each observation contain less unique information. (Another consequence of this http://www.ats.ucla.edu/stat/stata/library/cpsu.htm is that the effective sample size is diminished.) This kind of correlation (between observations) is called an intraclass correlation. It is different from a Pearson correlation, which is between two variables. So, how bad could ignoring the intraclass correlation be? We have reproduced Table 1.1 from Introduction to Multilevel Modeling by Ita Kreft and Jan de Leeuw which shows what happens to the alpha values when you thought that they were 0.05. The rows of the table show different values of N, the number of subjects in the experiment or survey. The columns show different values of rho, the intraclass correlation coefficient. As you can see, if you have only 10 subjects and an intraclass correlation coefficient of 0.01, your true alpha value is 0.06, which is not much different from 0.05. However, if you have 100 subjects and an intraclass correlation coefficient of 0.20, your real alpha level is 0.70! Alpha = 0.05, Table 1.1, page 10 from Introduction to Multilevel Modeling by Ita Kreft and Jan de Leeuw rho N 0.01 0.05 0.20 10 0.06 0.11 0.28 25 0.08 0.19 0.46 50 0.11 0.30 0.59 100 0.17 0.43 0.70 When people realize that they are analyzing correlated data, they often wonder about the possible strategies that are available to account for the correlation. Th
Tour Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more http://stats.stackexchange.com/questions/49050/dummies-clustered-standard-errors-or-both about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Cross Validated Questions Tags Users Badges Unanswered Ask Question _ Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The standard error best answers are voted up and rise to the top Dummies, clustered standard errors or both? up vote 2 down vote favorite Relative novice here. I am running a regression on an observational setting in which Y is the outcome and D is the treatment indicator. Observations are drawn from 3 different geographic groups designated by X. Is the proper approach to: Regress D on Y and cluster the standard cluster standard error errors by group. Regress X and D on Y. Regress X and D on Y and cluster the standard errors by group. When pursuing option #3 I am seeing much higher statistical significance -- and I'm worried somehow that including both dummies and the clustering in a cross-sectional setting is problematic. In principle, what are the tradeoffs between the 3 approaches? Which is most likely to offer an unbiased estimate of the treatment effect D (assuming other covariates -- not included here -- are balanced between the 2 groups)? regression categorical-data clustered-standard-errors share|improve this question edited Feb 3 '13 at 0:23 mbq 17.7k849103 asked Feb 1 '13 at 9:22 user20353 1112 I guess you meant, "regress $Y$ on $D$", etc. The dependent variable is regressed on a set of explanatory variables. –StasK Feb 1 '13 at 13:11 add a comment| 2 Answers 2 active oldest votes up vote 2 down vote So with clustered standard errors in your situation you are saying, basically, that you are happy with the stability of the estimate of variance based on three observations, and equally happy to assume that 3 is infinity in terms of using asymptotic normality for your inference. See sec. 8.2.3 of Mostly Harmless Econometrics. 42 i
be down. Please try the request again. Your cache administrator is webmaster. Generated Wed, 05 Oct 2016 19:28:34 GMT by s_hv972 (squid/3.5.20)