Layer-wise supervised incremental training of residual networks

Explore techniques for training supervised residual networks in a layer-by-layer fashion, rather than end-to-end.

Start date: October 2016
Category: Fundamental Research
Contact point: Giorgio Patrini -

Problem description

Due to their structure, it may be that residual modules [1] could be trained incrementally, starting from a previous, shallower net learned with full supervision. At each step, the network would learn an additional residual module, which would be an additional non-linear feature representation of the input that is fed into the previous module — the classifier. A very useful reading to help intuition of this effect is [2], which gives an ensemble-like interpretation of residual networks. The re-use of the previously trained layers should save computational time. Moreover, it is possible to show that at each step we are learning in a strictly larger model space, of which network learned in the previous step is the optimal model when we zero-out the weights of the new residual units just added.

Some approaches for incremental learning have been recently investigated [3, 4]. They share some intuition with this one. Although they try to solve the more general problem of transfer learning and they are not tailored to residual networks specifically.

Why this problem matters

Efficient layer-wise training of deep networks could allow to significantly speed up training of large models. It is one of the long-standing “dreams” of deep learning, but has proven elusive so far. If such a method were to be devised and performed competitively with end-to-end trained models while providing computational benefits, it would quickly be adopted across the entire field.


  • ImageNet - large-scale classification.
  • OpenImages - large-scale classification.
  • MS COCO - smaller scale classification, detection, segmentation.
  • CIFAR10 - small scale classification.