000030482 001__ 30482 000030482 005__ 20180918115443.0 000030482 0247_ $$2urn$$aurn:nbn:ch:rero-006-111196 000030482 0248_ $$aoai:doc.rero.ch:20121023125310-TI$$punisi$$pthesis$$pthesis_urn$$pthesis_unisi$$prero_explore$$zcdu34$$zreport$$zcdu004$$zbook$$zjournal$$zpostprint$$zcdu16$$zpreprint$$zcdu1$$zdissertation 000030482 041__ $$aeng 000030482 080__ $$a004 000030482 100__ $$aSun, Yi$$d1982-11-17 000030482 245__ $$9eng$$aOn the generation of representations for reinforcement learning 000030482 300__ $$a130 p 000030482 502__ $$92012-09-04$$aThèse de doctorat : Università della Svizzera italiana, 2012 ; 2012INFO001 000030482 506__ $$ffree 000030482 520__ $$9eng$$aCreating autonomous agents that learn to act from sequential interactions has long been perceived as one of the ultimate goals of Artificial Intelligence (AI). Reinforcement Learning (RL), a subfield of Machine Learning (ML), addresses important aspects of this objective. This dissertation investigates a particular problem encountered in RL called representation generation. Two related sub-problems are considered, namely basis generation and model learning, concerning which we present three pieces of original research. The first contribution considers a particular basis generation method called online kernel sparsification (OKS). OKS was originally proposed for recursive least squares regression, and shortly thereafter extended to RL. Despite the popularity of the method, important theoretical questions are still to be answered. In particular, it was unclear how the size of the OKS dictionary, or equivalently the number of basis functions constructed, grows in relation to the amount of data available. Characterizing this growth rate is crucial to understanding OKS, both in terms of its computational complexity and, perhaps more importantly, the generalization capability of the resulting linear regressor or value function estimator. We investigate this problem using a novel formula expressing the expected determinant of the kernel Gram matrix in terms of the eigenvalues of the covariance operator. Based on this formula, we are able to connect the cardinality of the dictionary with the eigen- decay of the covariance operator. In particular, we prove that under certain technical conditions, the size of the dictionary will always grow sub-linearly in the number of data points, and, as a consequence, the kernel linear regressor or value function estimator constructed from the resulting dictionary is consistent. The second contribution turns to a different class of basis generation methods, which make use of reward information. We introduce a new method called V-BEBF. V-BEBF relies on a principle that is different from that of previous approaches based on Bellman error basis function (BEBF), in which approximations to the value function of the Bellman error, rather than to the Bellman erroritself as in BEBF, are added as new basis functions. This approach is justified by a simple yet previously unspotted insight, i.e., V-BEBF, if computed exactly, is in fact the error in value estimation, and therefore its addition to the existing set of basis functions immediately allows the value function to be represented accurately. We demonstrate that V-BEBF is a promising alternative to BEBF, especially when the discount factor approaches $1$, in which case it is proven that BEBF, even if computed exactly, can be very inefficient. Limited experiments, where both V- BEBFs and BEBFs are approximated using linear combinations of the input features, are also conducted, and the result is in line with the theoretical finding. The last contribution focuses on model learning, especially learning the transition model of the environment. The problem is investigated under a Bayesian framework, where the learning is done by probabilistic inference, and the learning progress is measured using Shannon information gain. In this setting, we show that the problem can be formulated as an RL problem, where the reward is given by the immediate information gain resulting from performing the next action. This shows that the model-learning problem can in principle be solved using algorithms developed for RL. In particular, we show theoretically that if the environment is an MDP, then near optimal model learning can be achieved following this approach. 000030482 695__ $$9eng$$aArtificial intelligence ; Machine Learning ; Reinforcement learning ; Information theory ; Linear function approximation ; Online kernel sparsification ; Basis construction ; Bellman error basis functions ; Model learning ; Exploration 000030482 700__ $$aSchmidhuber, Jürgen$$eDir. 000030482 8564_ $$f2012INFO001.pdf$$qapplication/pdf$$s1311442$$uhttps://doc.rero.ch/record/30482/files/2012INFO001.pdf$$yorder:1$$zTexte intégral 000030482 918__ $$aFacoltà di scienze informatiche$$bVia Lambertenghi 10A, CH-6904 Lugano 000030482 919__ $$aUniversità della Svizzera italiana$$bLugano$$ddoc.support@rero.ch 000030482 980__ $$aTHESIS$$bUNISI$$fTH_PHD 000030482 990__ $$a20121023125310-TI