Error Propagation Approximate Policy Value Iteration
Project, INRIA Lille, Lille, France Csaba Szepesvári Department of Computing Science, University of Alberta, Edmonton, Canada Published in: ·Proceeding NIPS'10 Proceedings of the 23rd International Conference on Neural Information Processing Systems Pages 568-576 Curran Associates Inc. , USA ©2010 tableofcontents 2010 Article Bibliometrics ·Downloads (6 Weeks): 0 ·Downloads (12 Months): 0 ·Downloads (cumulative): 0 ·Citation Count: 0 Tools and Resources Buy this Article Recommend the ACM DLto your organization Save to Binder Export Formats: BibTeX EndNote ACMRef Publisher Site Publisher Site Share: | Contact Us | Switch to single page view (no tabs) **Javascript is not enabled and is required for the "tabbed view" or switch to the single page view** Powered by The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc. Terms of Usage Privacy Policy Code of Ethics Contact Us Useful downloads: Adobe Reader QuickTime Windows Media Player Real Player Did you know the ACM DL App is now available? Did you know your Organization can subscribe to the ACM Digital Library? The ACM Guide to Computing Literature All Tags Export Formats Save to Binder
Or use your Academic/Social account: Congratulations! You have just completed your registration at OpenAire. Before you can login to the site, you will need to activate your account. An e-mail will be sent to you with the proper instructions. Important! Please note that this site is currently undergoing Beta testing. Any new content you create is not guaranteed to be present to the final version of the site upon release. Thank you for your patience, OpenAire Dev Team. Close This Message CREATE AN ACCOUNT Name: Username: Password: Verify Password: E-mail: Verify E-mail: *All Fields Are Required. Please http://dl.acm.org/citation.cfm?id=2997253 Verify You Are Human: Register BLOGNewsletter Participate Deposit Publications & DataLink Research ResultsValidate / Register RepositoryContent policySearch Publications, data, projects, ...Data ProvidersGeneral informationMonitor OA in EuropeEC Funding FP7ERCFETEU funders FCTResearch communities EGISupport Helpdesk Ask a questionFAQResources GuidesCopyright issuesH2020 FactsheetsTraining WorkshopsWebinarsOpen Access Background OverviewPolicies and MandatesOpen Access in FP7Open Access in H2020In practice EU Member StatesPilots FP7 Post-Grant OA PilotOpen Research Data Pilot Error propagation for approximate https://www.openaire.eu/search/publication?articleId=dedup_wf_001::4cd0f39916822dfd76c46d812aafecf2 policy and value iteration Farahmand, Amir Massoud; Munos, RĂ©mi; Szepesvari, Csaba (2010) Publisher: HAL CCSD Languages: English Types: Conference object Subjects: [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] International audience; We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Nikodym derivative of a certain distribution rather than its supremum as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast. Link to project Link to research data References (22) Related Research Data (0) Similar Publications (0) view all 22 The results below are discovered through our pilot algorithms. Let us know how we are doing![1] Damien Ernst, Pierre Geurts, and Louis Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556, 2005.[2] Martin Riedmiller. Neural
be down. Please try the request again. Your cache administrator is webmaster. Generated Fri, 14 Oct 2016 15:10:20 GMT by s_wx1131 (squid/3.5.20)
be down. Please try the request again. Your cache administrator is webmaster. Generated Fri, 14 Oct 2016 15:10:20 GMT by s_wx1131 (squid/3.5.20)