Expected True Error Rate
Contents |
Health Search databasePMCAll DatabasesAssemblyBioProjectBioSampleBioSystemsBooksClinVarCloneConserved DomainsdbGaPdbVarESTGeneGenomeGEO DataSetsGEO ProfilesGSSGTRHomoloGeneMedGenMeSHNCBI Web SiteNLM CatalogNucleotideOMIMPMCPopSetProbeProteinProtein ClustersPubChem BioAssayPubChem CompoundPubChem SubstancePubMedPubMed HealthSNPSparcleSRAStructureTaxonomyToolKitToolKitAllToolKitBookToolKitBookghUniGeneSearch termSearch Advanced expected error rate naive bayes Journal list Help Journal ListBioinformaticsPMC4296143 Bioinformatics. 2014 Dec 1; 30(23): expected error rate definition 3349–3355. Published online 2014 Aug 13. doi: 10.1093/bioinformatics/btu527PMCID: PMC4296143Cross-validation under separate sampling: strong bias and confusion matrix how to correct itUlisses M. Braga-Neto,1,2 Amin Zollanvari,1,3 and Edward R. Dougherty1,2,*1Department of Electrical and Computer Engineering, 2Center for Bioinformatics and Genomic Systems Engineering
Roc Curve
and 3Department of Statistics, Texas A&M University, College Station, TX, 77843, USA*To whom correspondence should be addressed.Associate Editor: Jonathan WrenAuthor information ► Article notes ► Copyright and License information ►Received 2013 Dec 30; Revised 2014 Jul 23; Accepted 2014 Jul 30.Copyright © The Author 2014. Published by Oxford University logistic regression Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.comThis article has been cited by other articles in PMC.AbstractMotivation: It is commonly assumed in pattern recognition that cross-validation error estimation is ‘almost unbiased’ as long as the number of folds is not too small. While this is true for random sampling, it is not true with separate sampling, where the populations are independently sampled, which is a common situation in bioinformatics.Results: We demonstrate, via analytical and numerical methods, that classical cross-validation can have strong bias under separate sampling, depending on the difference between the sampling ratios and the true population probabilities. We propose a new separate-sampling cross-validation error estimator, and prove that it satisfies an ‘almost unbiased’ theorem similar to that of random-sampling cross-validation. We present two case studies with previously published data, which show that the results can change drastically if the correct form of cros
measure its prediction error is of key importance. Often, however, techniques of measuring error are used that give grossly misleading results. This can lead to the phenomenon of over-fitting where a model may t test fit the training data very well, but will do a poor job of predicting results for new data not used in model training. Here is an overview of methods to accurately measure model prediction error. Measuring Error When building prediction models, the primary goal should be to make a model that most accurately predicts the desired target value for new data. The measure http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4296143/ of model error that is used should be one that achieves this goal. In practice, however, many modelers instead report a measure of model error that is based not on the error for new data but instead on the error the very same data that was used to train the model. The use of this incorrect error measure can lead to the selection http://scott.fortmann-roe.com/docs/MeasuringError.html of an inferior and inaccurate model. Naturally, any model is highly optimized for the data it was trained on. The expected error the model exhibits on new data will always be higher than that it exhibits on the training data. As example, we could go out and sample 100 people and create a regression model to predict an individual's happiness based on their wealth. We can record the squared error for how well our model does on this training set of a hundred people. If we then sampled a different 100 people from the population and applied our model to this new group of people, the squared error will almost always be higher in this second case. It is helpful to illustrate this fact with an equation. We can develop a relationship between how well a model predicts on new data (its true prediction error and the thing we really care about) and how well it predicts on the training data (which is what many modelers in fact measure). $$ True\ Prediction\ Error = Training\ Error + Training\ Optimism $$ Here, Training Optimism is basically a me
please contact our Journal Customer Services team. http://wiley.force.com/Interface/ContactJournalCustomerServices_V2. If your institution does not currently subscribe to this content, please recommend the title to your librarian.Login error rate via other institutional login options http://onlinelibrary.wiley.com/login-options.You can purchase online access to this Article for a 24-hour period (price varies by title) If you already have a Wiley Online expected error rate Library or Wiley InterScience user account: login above and proceed to purchase the article.New Users: Please register, then proceed to purchase the article. Login via OpenAthens or Search for your institution's name below to login via Shibboleth. Institution Name Registered Users please login: Access your saved publications, articles and searchesManage your email alerts, orders and subscriptionsChange your contact information, including your password E-mail: Password: Forgotten Password? Please register to: Save publications, articles and searchesGet email alertsGet all the benefits mentioned below! Register now >
be down. Please try the request again. Your cache administrator is webmaster. Generated Thu, 13 Oct 2016 21:55:15 GMT by s_ac5 (squid/3.5.20)