Acceptable Standard Error In Logistic Regression
Contents |
Tour Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta standard error logistic regression r Discuss the workings and policies of this site About Us Learn more
Standard Error Of Logistic Regression Coefficient
about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Cross logistic regression standard error of prediction Validated Questions Tags Users Badges Unanswered Ask Question _ Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and
Logistic Regression Standard Error Formula
data visualization. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top Understanding standard errors in logistic regression up vote 2 down vote favorite I am having trouble understanding the meaning of the standard errors in logistic regression large standard error my thesis analysis and whether they indicate that my data (and the estimates) are not good enough. I am performing an analysis with Stata, on immigrant-native gap in school performance (dependent variable = good / bad results) controlling for a variety of regressors. I used both logit and OLS and I adjusted for cluster at the school level. The regressors which are giving me trouble are some interaction terms between a dummy for country of origin and a dummy for having foreign friends (I included both base-variables in the model as well). In the logit estimation, more than one of the country*friend variables have a SE greater than 1 (up to 1.80 or so), and some of them are significant as well. This does not happen with the OLS. I am really confused on how to interpret this. I have always understood that high standard errors are not really a good sign, because it means that your data are too spread out. But still (some of) the coefficients are significant, which works per
Tour Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About
Variance Logistic Regression
Us Learn more about Stack Overflow the company Business Learn more about hiring
T Test Logistic Regression
developers or posting ads with us Cross Validated Questions Tags Users Badges Unanswered Ask Question _ Cross Validated is a r square logistic regression question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Join them; it only takes a minute: Sign up Here's how it works: Anybody can http://stats.stackexchange.com/questions/89810/understanding-standard-errors-in-logistic-regression ask a question Anybody can answer The best answers are voted up and rise to the top How to interpret coefficient standard errors in linear regression? up vote 9 down vote favorite 8 I'm wondering how to interpret the coefficient standard errors of a regression when using the display function in R. For example in the following output: lm(formula = y ~ x1 + x2, data = http://stats.stackexchange.com/questions/18208/how-to-interpret-coefficient-standard-errors-in-linear-regression sub.pyth) coef.est coef.se (Intercept) 1.32 0.39 x1 0.51 0.05 x2 0.81 0.02 n = 40, k = 3 residual sd = 0.90, R-Squared = 0.97 Does a higher standard error imply greater significance? Also for the residual standard deviation, a higher value means greater spread, but the R squared shows a very close fit, isn't this a contradiction? r regression interpretation share|improve this question edited Mar 23 '13 at 11:47 chl♦ 37.3k6123243 asked Nov 10 '11 at 20:11 Dbr 95481629 add a comment| 1 Answer 1 active oldest votes up vote 26 down vote accepted Parameter estimates, like a sample mean or an OLS regression coefficient, are sample statistics that we use to draw inferences about the corresponding population parameters. The population parameters are what we really care about, but because we don't have access to the whole population (usually assumed to be infinite), we must use this approach instead. However, there are certain uncomfortable facts that come with this approach. For example, if we took another sample, and calculated the statistic to estimate the parameter again, we would almost certainly find that it differs. Moreover, neither estimate is likely to quite match the true parameter value tha
the performance of the model Why use logistic regression? There are many important research topics for which the dependent variable is "limited" (discrete not continuous). Researchers often want to analyze whether http://www.appstate.edu/~whiteheadjc/service/logit/intro.htm some event occurred or not, such as voting, participation in a public program, https://www.researchgate.net/post/Is_it_weird_to_get_a_very_big_odds_ratio_in_logistic_regression business success or failure, morbidity, mortality, a hurricane and etc. Binary logistic regression is a type of regression analysis where the dependent variable is a dummy variable (coded 0, 1). A data set appropriate for logistic regression might look like this: Descriptive Statistics Variable N Minimum Maximum Mean Std. Deviation YES 122 .00 logistic regression 1.00 .6393 .4822 BAG 122 .00 7.00 1.5082 1.8464 COST 122 9.00 953.00 416.5492 285.4320 INCOME 122 5000.00 85000.00 38073.7705 18463.1274 Valid N (listwise) 122 *This data is from a U.S. Department of the Interior survey (conducted by U.S. Bureau of the Census) which looks at a yes/no response to a question about the "willingness to pay" higher travel costs for deer hunting trips in standard error of North Carolina (a more complete description of this data can be found here). The linear probability model "Why shouldn't I just use ordinary least squares?" Good question. Consider the linear probability (LP) model: Y = a + BX + e where Y is a dummy dependent variable, =1 if event happens, =0 if event doesn't happen, a is the coefficient on the constant term, B is the coefficient(s) on the independent variable(s), X is the independent variable(s), and e is the error term. Use of the LP model generally gives you the correct answers in terms of the sign and significance level of the coefficients. The predicted probabilities from the model are usually where we run into trouble. There are 3 problems with using the LP model: The error terms are heteroskedastic (heteroskedasticity occurs when the variance of the dependent variable is different with different values of the independent variables): var(e)= p(1-p), where p is the probability that EVENT=1. Since P depends on X the "classical regression assumption" that the error term does not depend on the Xs is violated. e is not normally distributed because P takes on only two values, violating another "c
regression? I estimated logit using enter method and one of the odds is of 3962.988 with sig. 0.000. And another model, estimated using forward stepwise (likelihood ratio), produced odds ratio of 274.744 with sig. 0.000. Total N is 180, missing 37. The model is fitted based on Omnibus and Hosmer & Lemeshow. 2LL is ok. Is there anything wrong? Please assist. Topics Stepwise Regression Analysis × 12 Questions 4 Followers Follow Biostatistical Methods × 618 Questions 2,995 Followers Follow Financial Econometrics × 226 Questions 3,284 Followers Follow Odds Ratio × 129 Questions 43 Followers Follow Logistic Regression × 454 Questions 333 Followers Follow Logit × 106 Questions 20 Followers Follow Jan 31, 2015 Share Facebook Twitter LinkedIn Google+ 1 / 0 Popular Answers Kelvyn Jones · University of Bristol The underlying problem is that you do not have sufficient information to fit the models that you are attempting . You are in effect modelling a cross tabulation with the outcome in two cells (yes/No) and 3 predictors which are at a minimum 2 cells each. So the full cross-tabulation is 2 by 2 by 2 by 2 that is 16 cells. [I know that the data do not actually look like this.] If the data were completely balanced and there is no collinearity between the X's and with 140 non-missing observations - you will have less than 10 bits of information in each cell. - that is not very much. You are then using an automatic procedure that could ( will!) capitalize on chance results and the estimates will be unreliable - so frankly you should have no faith in what you have found and that is what the implausible value are telling you. Feb 9, 2016 Carol Hargreaves Yes, getting a large odds ratio is an indication that you need to check your data input for: 1. Outliers 2. Amount of Missing Values and handle the missing values 3. The metric used for the analysis may need to be changed for example from 'cents' to 'dollars'. 4. The way in which the data was coded may be incorrect and may need to be changed. 5. If the standard deviation is too large may need to segment and perform your analysis on segments. The above 5 suggestions are examples of how to manage large odds ratio results, to make them more meaningful. I hope this helps! Kind Regards Carol Feb 3, 2015 All Answers (20) Dr. Senthilvel Vasudevan · King Saud bin Abdulaziz University for Health Sciences Hi, Odds radio wouldn't come very huge value. Kindly check your analysis and find out the 95% Confidence Interval where the