Triesch Lab - Videos

Project Presentations

Here we showcase some research projects of the past and the present. They might be of special interest to newcomers who want to join the group as a Bachelor's or Master's student and learn about what we are (and were) interested in at the lab.

Using Temporal Structure of the Visual Experience to Learn Representations

Biological vision systems are unparalleled in their ability to learn visual representations without supervision. In machine learning, self-supervised learning (SSL) has led to major advances in forming object representations in an unsupervised fashion. Such systems learn representations invariant to augmentation operations over images, like cropping or flipping. In contrast, biological vision systems exploit the temporal structure of the visual experience during natural interactions with objects. This gives access to “augmentations” not commonly used in SSL, like watching the same object from multiple viewpoints or against different backgrounds. For our ICLR paper "Time to augment self-supervised visual representation learning" we systematically investigated the potential benefits of such time-based augmentations during natural interactions for learning object categories. Our results show that incorporating time-based augmentations achieves large performance gains over state-of-the-art image augmentations. Specifically, our analyses reveal that: 1) 3-D object manipulations drastically improve the learning of object categories; 2) viewing objects against changing backgrounds is important for learning to discard background-related information from the latent representation.

MIMo - The Multimodel Infant Model

A central challenge in the early cognitive development of humans is making sense of the rich multimodal experiences originating from interactions with the physical world. AIs that learn in an autonomous and open-ended fashion based on multimodal sensory input face a similar challenge. To study such development and learning in silico, we have created MIMo, a multimodal infant model. MIMo's body is modeled after an 18-month-old child and features binocular vision, a vestibular system, proprioception, and touch perception through a full body virtual skin. MIMo is an open source research platform based on the MuJoCo physics engine for constructing computational models of human cognitive development as well as studying open-ended autonomous learning in AI.

Learning Abstract Representations through Lossy Compression of Multi-Modal Signals

A key competence for open-ended learning is the formation of increasingly abstract representations useful for driving complex behavior. Abstract representations ignore specific details and facilitate generalization. Here we consider the learning of abstract representations in a multi-modal setting with two or more input modalities. We treat the problem as a lossy compression problem and show that generic lossy compression of multimodal sensory input naturally extracts abstract representations that tend to strip away modalitiy specific details and preferentially retain information that is shared across the different modalities. Furthermore, we propose an architecture to learn abstract representations by identifying and retaining only the information that is shared across multiple modalities while discarding any modality specific information.

Spike timing-based unsupervised learning of orientation, disparity, and motion representations in a spiking neural network

Neuromorphic vision sensors present unique advantages over their frame based counterparts. However, unsupervised learning of efficient visual representations from their asynchronous output is still a challenge, requiring a rethinking of traditional image and video processing methods. Here we present a network of leaky integrate and fire neurons that learns representations similar to those of simple and complex cells in the primary visual cortex of mammals from the input of two event-based vision sensors. Through the combination of spike timing-dependent plasticity and homeostatic mechanisms, the network learns visual feature detectors for orientation, disparity, and motion in a fully un-supervised fashion. We validate our approach on a mobile robotic platform.


This project aims to develop a new paradigm to build open-ended learning robots called `Goal-based Open ended Autonomous Learning' (GOAL). GOAL rests upon two key insights. First, to exhibit an autonomous open-ended learning process, robots should be able to self-generate goals, and hence tasks to practice. Second, new learning algorithms can leverage self-generated goals to dramatically accelerate skill learning. The new paradigm will allow robots to acquire a large repertoire of flexible skills in conditions unforeseeable at design time with little human intervention, and then to exploit these skills to efficiently solve new user-defined tasks with no/little additional learning. This innovation will be essential in the design of future service robots addressing pressing societal needs. For more information see the projects page or go to

Project Overview

As an overview of our accomplishments from the second year of the funding period, we have compiled a 5 minute video. The video is targeting a general audience ranging from interested lay people to colleagues in robotics. This video summarizes the motivation behind the GOAL-Robots project, illustrates our research platforms, and explains our latest progress towards building robots capable of open-ended learning through defining their own learning goals and practicing the skills necessary for accomplishing these goals. 

A Skinned Agent

Throughout the project we explore environments of different complexity and examine how vision, proprioception and interaction with the environment can lead to the acquisition of novel skills. Ultimately, the agent is supposed to set its own goals in terms of interesting or unexpected behaviour. 

Learning Vergence and Smooth Pursuit

Together with their long-term collaborators from the Hong Kong University of Science and Technology (HKUST) we developed methods for the autonomous learning of a repertoire of active vision skills. The work is formulated in the active efficient coding (AEC) framework, a generalization of classic efficient coding ideas to active perception. AEC postulates that active perception systems should not only adapt their representations to the statistics of sensory signals, but they should use their behavior, in particular movements of their sense organs, to promote efficient encoding of sensory signals. Along these lines, they have put together active binocular vision systems for the iCub robot that learn a repertoire of visual skills that can form the basis for interacting with objects: fixating “interesting” locations in the world based on measures of surprise or learning progress (Zhu et al., 2017), precisely coordinating both eyes for stereoscopic vision, and tracking objects moving in three dimensions (Lelais et al., 2019).