Facoltà di scienze informatiche

New architectures for very deep learning

Srivastava, Rupesh Kumar ; Schmidhuber, Jürgen (Dir.)

Thèse de doctorat : Università della Svizzera italiana, 2018 ; 2018INFO006.

Artificial Neural Networks are increasingly being used in complex real- world applications because many-layered (i.e., deep) architectures can now be trained on large quantities of data. However, training even deeper, and therefore more powerful networks, has hit a barrier due to fundamental limitations in the design of existing networks. This thesis develops new architectures that, for the... Plus

Ajouter à la liste personnelle
    Summary
    Artificial Neural Networks are increasingly being used in complex real- world applications because many-layered (i.e., deep) architectures can now be trained on large quantities of data. However, training even deeper, and therefore more powerful networks, has hit a barrier due to fundamental limitations in the design of existing networks. This thesis develops new architectures that, for the first time, allow very deep networks to be optimized efficiently and reliably. Specifically, it addresses two key issues that hamper credit assignment in neural networks: cross-pattern interference and vanishing gradients. Cross- pattern interference leads to oscillations of the network’s weights that make training inefficient. The proposed Local Winner-Take-All networks reduce interference among computation units in the same layer through local competition. An in-depth analysis of locally competitive networks provides generalizable insights and reveals unifying properties that improve credit assignment. As network depth increases, vanishing gradients make a network’s outputs increasingly insensitive to the weights close to the inputs, causing the failure of gradient-based training. To overcome this limitation, the proposed Highway networks regulate information flow across layers through additional skip connections which are modulated by learned computation units. Their beneficial properties are extended to the sequential domain with Recurrent Highway Networks that gain from increased depth and learn complex sequential transitions without requiring more parameters.