Oob Error
Contents |
Feature engineering Feature learning Online learning Semi-supervised learning Unsupervised learning Learning to rank Grammar induction Supervised learning (classification• regression) Decision trees random forest oob score Ensembles (Bagging, Boosting, Random forest) k-NN Linear regression Naive Bayes out of bag prediction Neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM) Clustering out of bag error cross validation BIRCH Hierarchical k-means Expectation-maximization (EM) DBSCAN OPTICS Mean-shift Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA t-SNE Structured prediction Graphical models (Bayes net,
Out Of Bag Typing Test
CRF, HMM) Anomaly detection k-NN Local outlier factor Neural nets Autoencoder Deep learning Multilayer perceptron RNN Restricted Boltzmann machine SOM Convolutional neural network Reinforcement Learning Q-Learning SARSA Temporal Difference (TD) Theory Bias-variance dilemma Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Machine learning out of bag estimation breiman venues NIPS ICML JMLR ArXiv:cs.LG Machine learning portal v t e Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating to sub-sample data sampled used for training. OOB is the mean prediction error on each training sample xᵢ, using only the trees that did not have xᵢ in their bootstrap sample.[1] Subsampling allows one to define an out-of-bag estimate of the prediction performance improvement by evaluating predictions on those observations which were not used in the building of the next base learner. Out-of-bag estimates help avoid the need for an independent validation dataset, but often underestimate actual performance improvement and the optimal number of iterations.[2] See also[edit] Boosting (meta-algorithm) Bootstrapping (statistics) Cross-validation (statistics) Random forest Random subspace method (attribute bagging) References[edit] ^ Ga
Tour Start here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About Us Learn more about Stack Overflow the company Business Learn more about hiring developers or posting ads with us Cross Validated Questions Tags Users Badges Unanswered Ask Question _ Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Join them; it only takes a minute: Sign up Here's how it works: Anybody can ask a question Anybody can answer The best answers are voted up and rise to the top How to interpret OOB and confusion matrix for random forest? up vote 28 down vote favorite 20 I got a an R script from someone to run a random forest model. I modified and run it with some employee data. We are trying to predict voluntary separations. Here is some additional info: this is a classification model were 0 = employee stayed, 1= employee terminated, we are currently only looking at a dozen predictor variables, the data is "unbalanced" in that the term'd records make up about 7% of the total record set. I run the model with various mtry and ntree selections but settled on the below. The OOB is 6.8% which I think is good but the confusion matrix seems to tell a different story for predicting terms since the error rate is quite high at 92.79% Am I right in assuming that I can't rely on and use this model because the high error rate for predicting terms? or is there something also I can do to use RF and get a smaller error rate for predicting terms? FOREST_model <- randomForest(theFormula, data=trainset, mtry=3, ntree=500, importance=TRUE, do.trace=100) ntree OOB 1 2 100: 6.97% 0.47% 92.79% 200: 6.87% 0.36% 92.79% 300: 6.82% 0.33% 92.55% 400: 6.80% 0.29% 92.79% 500: 6.80% 0.29% 92.79% > print(FOREST_model) Call: randomForest(formula = theFormula, data = trainset, mtry = 3, ntree = 500, importance = TRUE, do.trace = 100) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 3 OOB estim