Large Standard Error Logistic Regression
Contents |
Tour Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site testing assumptions of logistic regression About Us Learn more about Stack Overflow the company Business Learn more about
Logistic Regression Multicollinearity
hiring developers or posting ads with us Cross Validated Questions Tags Users Badges Unanswered Ask Question _ Cross Validated is
Logistic Regression Diagnostics
a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Join them; it only takes a minute: Sign up Here's how it works: Anybody
Goodness Of Fit Logistic Regression
can ask a question Anybody can answer The best answers are voted up and rise to the top Understanding standard errors in logistic regression up vote 2 down vote favorite I am having trouble understanding the meaning of the standard errors in my thesis analysis and whether they indicate that my data (and the estimates) are not good enough. I am performing an analysis with Stata, on logistic regression diagnostics r immigrant-native gap in school performance (dependent variable = good / bad results) controlling for a variety of regressors. I used both logit and OLS and I adjusted for cluster at the school level. The regressors which are giving me trouble are some interaction terms between a dummy for country of origin and a dummy for having foreign friends (I included both base-variables in the model as well). In the logit estimation, more than one of the country*friend variables have a SE greater than 1 (up to 1.80 or so), and some of them are significant as well. This does not happen with the OLS. I am really confused on how to interpret this. I have always understood that high standard errors are not really a good sign, because it means that your data are too spread out. But still (some of) the coefficients are significant, which works perfect for me because it is the result I was looking for. Can I just ignore the SE? Or does it raise a red flag regarding my results? I usually just ignore the SE in regressions (I know, it is not really what one should do) but I can't rec
regression? I estimated logit using enter method and one of the odds is of 3962.988 with sig. 0.000. And another model, estimated using forward stepwise (likelihood ratio), produced odds ratio of 274.744 with sig. 0.000. Total N is 180, missing 37. The multicollinearity in logistic regression sas model is fitted based on Omnibus and Hosmer & Lemeshow. 2LL is ok. Is there anything binary logistic regression assumptions wrong? Please assist. Topics Stepwise Regression Analysis × 12 Questions 4 Followers Follow Biostatistical Methods × 631 Questions 2,997 Followers Follow Financial Econometrics × logistic regression assumptions r 228 Questions 3,283 Followers Follow Odds Ratio × 129 Questions 43 Followers Follow Logistic Regression × 469 Questions 336 Followers Follow Logit × 108 Questions 21 Followers Follow Jan 31, 2015 Share Facebook Twitter LinkedIn Google+ 1 / 0 Popular http://stats.stackexchange.com/questions/89810/understanding-standard-errors-in-logistic-regression Answers Kelvyn Jones · University of Bristol The underlying problem is that you do not have sufficient information to fit the models that you are attempting . You are in effect modelling a cross tabulation with the outcome in two cells (yes/No) and 3 predictors which are at a minimum 2 cells each. So the full cross-tabulation is 2 by 2 by 2 by 2 that is 16 cells. [I know that the data do not actually look like this.] If https://www.researchgate.net/post/Is_it_weird_to_get_a_very_big_odds_ratio_in_logistic_regression the data were completely balanced and there is no collinearity between the X's and with 140 non-missing observations - you will have less than 10 bits of information in each cell. - that is not very much. You are then using an automatic procedure that could ( will!) capitalize on chance results and the estimates will be unreliable - so frankly you should have no faith in what you have found and that is what the implausible value are telling you. Feb 9, 2016 Carol Hargreaves Yes, getting a large odds ratio is an indication that you need to check your data input for: 1. Outliers 2. Amount of Missing Values and handle the missing values 3. The metric used for the analysis may need to be changed for example from 'cents' to 'dollars'. 4. The way in which the data was coded may be incorrect and may need to be changed. 5. If the standard deviation is too large may need to segment and perform your analysis on segments. The above 5 suggestions are examples of how to manage large odds ratio results, to make them more meaningful. I hope this helps! Kind Regards Carol Feb 3, 2015 All Answers (20) Dr. Senthilvel Vasudevan · King Saud bin Abdulaziz University for Health Sciences Hi, Odds radio wouldn't come very huge value. Kindly check your analysis and find out
model Generalized linear model Discrete choice Logistic regression Multinomial logit Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson Multilevel model Fixed effects Random effects https://en.wikipedia.org/wiki/Logistic_regression Mixed model Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least http://people.duke.edu/~rnau/regnotes.htm angle Local Segmented Errors-in-variables Estimation Least squares Ordinary least squares Linear (math) Partial Total Generalized Weighted Non-linear Non-negative Iteratively reweighted Ridge regression Least absolute deviations Bayesian Bayesian multivariate Background Regression model validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov logistic regression theorem Statistics portal v t e "Logit model" redirects here. It is not to be confused with Logit function. In statistics, logistic regression, or logit regression, or logit model[1] is a regression model where the dependent variable (DV) is categorical. This article covers the case of binary dependent variables—that is, where it can take only two logistic regression diagnostics values, such as pass/fail, win/lose, alive/dead or healthy/sick. Cases with more than two categories are referred to as multinomial logistic regression, or, if the multiple categories are ordered, as ordinal logistic regression.[2] Logistic regression was developed by statistician David Cox in 1958.[2][3] The binary logistic model is used to estimate the probability of a binary response based on one or more predictor (or independent) variables (features). As such it is not a classification method. It could be called a qualitative response/discrete choice model in the terminology of economics. Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution. Thus, it treats the same set of problems as probit regression using similar techniques, with the latter using a cumulative normal distribution curve instead. Equivalently, in the latent variable interpretations of these two methods, the first assumes a standard logistic distribution of errors and the second a standard normal distribution o
1: descriptive analysis · Beer sales vs. price, part 2: fitting a simple model · Beer sales vs. price, part 3: transformations of variables · Beer sales vs. price, part 4: additional predictors · NC natural gas consumption vs. temperature What to look for in regression output What's a good value for R-squared? What's the bottom line? How to compare models Testing the assumptions of linear regression Additional notes on regression analysis Stepwise and all-possible-regressions Excel file with simple regression formulas Excel file with regression formulas in matrix form If you are a PC Excel user, you must check this out: RegressIt: free Excel add-in for linear regression and multivariate data analysis Additional notes on linear regression analysis To include or not to include the CONSTANT? Interpreting STANDARD ERRORS, "t" STATISTICS, and SIGNIFICANCE LEVELS of coefficients Interpreting the F-RATIO Interpreting measures of multicollinearity: CORRELATIONS AMONG COEFFICIENT ESTIMATES and VARIANCE INFLATION FACTORS Interpreting CONFIDENCE INTERVALS TYPES of confidence intervals Dealing with OUTLIERS Caution: MISSING VALUES may cause variations in SAMPLE SIZE MULTIPLICATIVE regression models and the LOGARITHM transformation To include or not to include the CONSTANT? Most multiple regression models include a constant term (i.e., an "intercept"), since this ensures that the model will be unbiased--i.e., the mean of the residuals will be exactly zero. (The coefficients in a regression model are estimated by least squares--i.e., minimizing the mean squared error. Now, the mean squared error is equal to the variance of the errors plus the square of their mean: this is a mathematical identity. Changing the value of the constant in the model changes the mean of the errors but doesn't affect the variance. Hence, if the sum of squared errors is to be minimized, the constant must be chosen such that the mean of the errors is zero.) In a simple regression model, the constant represents the Y-intercept of the regression line, in unstandardized form. In a multiple regression model, the constant represents the value that would be predicted for the dependent variable if all the independent variables were simultaneously equal to zero--a situation which may not physically or economically meaningful. If you are not particularly interested in what would happen if all the independent variables were simultaneously zero, then you norma