On the generation of representations for reinforcement learning

Sun, Yi
000030482 001__ 30482
000030482 005__ 20180918115443.0
000030482 0247_ $$2urn$$aurn:nbn:ch:rero-006-111196
000030482 0248_ $$aoai:doc.rero.ch:20121023125310-TI$$punisi$$pthesis$$pthesis_urn$$pthesis_unisi$$prero_explore$$zcdu34$$zreport$$zcdu004$$zbook$$zjournal$$zpostprint$$zcdu16$$zpreprint$$zcdu1$$zdissertation
000030482 041__ $$aeng
000030482 080__ $$a004
000030482 100__ $$aSun, Yi$$d1982-11-17
000030482 245__ $$9eng$$aOn the generation of representations for reinforcement learning
000030482 300__ $$a130 p
000030482 502__ $$92012-09-04$$aThèse de doctorat : Università della Svizzera italiana, 2012 ; 2012INFO001
000030482 506__ $$ffree
000030482 520__ $$9eng$$aCreating autonomous agents that learn to act from sequential  interactions has long been perceived as one of the ultimate goals of  Artificial Intelligence (AI). Reinforcement Learning (RL), a subfield of  Machine Learning (ML), addresses important aspects of this objective.  This dissertation investigates a particular problem encountered in RL  called representation generation. Two related sub-problems are  considered, namely basis generation and model learning, concerning  which we present three pieces of original research. The first  contribution considers a particular basis generation method called online  kernel sparsification (OKS). OKS was originally proposed for recursive  least squares regression, and shortly thereafter extended to RL. Despite  the popularity of the method, important theoretical questions are still to  be answered. In particular, it was unclear how the size of the OKS  dictionary, or equivalently the number of basis functions constructed,  grows in relation to the amount of data available. Characterizing this  growth rate is crucial to understanding OKS, both in terms of its  computational complexity and, perhaps more importantly, the  generalization capability of the resulting linear regressor or value  function estimator. We investigate this problem using a novel formula  expressing the expected determinant of the kernel Gram matrix in terms  of the eigenvalues of the covariance operator. Based on this formula,  we are able to connect the cardinality of the dictionary with the eigen- decay of the covariance operator. In particular, we prove that under  certain technical conditions, the size of the dictionary will always grow  sub-linearly in the number of data points, and, as a consequence, the  kernel linear regressor or value function estimator constructed from the  resulting dictionary is consistent. The second contribution turns to a  different class of basis generation methods, which make use of reward  information. We introduce a new method called V-BEBF. V-BEBF relies  on a principle that is different from that of previous approaches based  on Bellman error basis function (BEBF), in which approximations to the  value function of the Bellman error, rather than to the Bellman erroritself  as in BEBF, are added as new basis functions. This approach is justified  by a simple yet previously unspotted insight, i.e., V-BEBF, if computed  exactly, is in fact the error in value estimation, and therefore its addition  to the existing set of basis functions immediately allows the value  function to be represented accurately. We demonstrate that V-BEBF is a  promising alternative to BEBF, especially when the discount factor  approaches $1$, in which case it is proven that BEBF, even if computed  exactly, can be very inefficient. Limited experiments, where both V- BEBFs and BEBFs are approximated using linear combinations of the  input features, are also conducted, and the result is in line with the  theoretical finding. The last contribution focuses on model learning,  especially learning the transition model of the environment. The problem  is investigated under a Bayesian framework, where the learning is done  by probabilistic inference, and the learning progress is measured using  Shannon information gain. In this setting, we show that the problem can  be formulated as an RL problem, where the reward is given by the  immediate information gain resulting from performing the next action. This  shows that the model-learning problem can in principle be solved using  algorithms developed for RL. In particular, we show theoretically that if  the environment is an MDP, then near optimal model learning can be  achieved following this approach.
000030482 695__ $$9eng$$aArtificial intelligence ; Machine Learning ; Reinforcement learning ; Information theory ; Linear function approximation ; Online kernel sparsification ; Basis construction ; Bellman error basis functions ; Model learning ; Exploration
000030482 700__ $$aSchmidhuber, Jürgen$$eDir.
000030482 8564_ $$f2012INFO001.pdf$$qapplication/pdf$$s1311442$$uhttps://doc.rero.ch/record/30482/files/2012INFO001.pdf$$yorder:1$$zTexte intégral
000030482 918__ $$aFacoltà di scienze informatiche$$bVia Lambertenghi 10A, CH-6904 Lugano
000030482 919__ $$aUniversità della Svizzera italiana$$bLugano$$ddoc.support@rero.ch
000030482 980__ $$aTHESIS$$bUNISI$$fTH_PHD
000030482 990__ $$a20121023125310-TI