#55 Self-Supervised Vision Models (Dr. Ishan Misra – FAIR).

Dr. Ishan Misra is a Research Scientist at Facebook AI Research where he works on Computer Vision and Machine Learning. His main research interest is reducing the need for human supervision, and indeed, human knowledge in visual learning systems. He finished his PhD at the Robotics Institute at Carnegie Mellon. He has done stints at Microsoft Research, INRIA and Yale. His bachelors is in computer science where he achieved the highest GPA in his cohort.

Ishan is fast becoming a prolific scientist, already with more than 3000 citations under his belt and co-authoring with Yann LeCun; the godfather of deep learning. Today though we will be focusing an exciting cluster of recent papers around unsupervised representation learning for computer vision released from FAIR. These are; DINO: Emerging Properties in Self-Supervised Vision Transformers, BARLOW TWINS: Self-Supervised Learning via Redundancy Reduction and PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with
Support Samples. All of these papers are hot off the press, just being officially released in the last month or so. Many of you will remember PIRL: Self-Supervised Learning of Pretext-Invariant Representations which Ishan was the primary author of in 2019.

Pod: https://anchor.fm/machinelearningstreettalk/episodes/55-Self-Supervised-Vision-Models-Dr–Ishan-Misra—FAIR-e1355js

Panel: Dr. Yannic Kilcher, Sayak Paul (https://sayak.dev/), Dr. Tim Scarfe

Self supervised learning [00:00:00]
Lineage of SSL methods [00:04:08]
Better representations [00:06:24]
Data Augmentation [00:07:15]
Mode Collapse [00:08:43]
Ishan Intro [00:09:30]
Dino [00:12:40]
PAWS [00:14:19]
Barlow Twins [00:15:09]
Dark matter of intelligence article [00:15:36]
Main show kick off [00:16:51]
Why Ishan is doing work in self-supervised learning [00:19:49]
We don’t know what tasks we want to do [00:21:57]
Should we try to get rid of human knowledge? [00:23:58]
Augmentations are knowledge via the back door [00:26:56]
Conceptual abstraction in vision [00:35:17]
Common sense is the dark matter of intelligence [00:38:14]
Are abstract categories (natural kinds) universal? [00:40:42]
Why do these vision algorithms actually work? [00:42:58]
Universality of representations, “semantics of similarity” [00:46:16]
Images on the internet are not uniformly random [00:49:41]
Quality of representations semi vs pure self-supervised [00:54:19]
Scaling laws for self-supervised learning and quality control [00:57:42]
Amazon turk thought experiment [01:00:42]
Architecture developments in SSL [01:03:01]
Architecture improvements – contrastive / SimCLR [01:05:33]
Architecture improvements – projector heads idea [01:07:08]
Architecture improvements – objective functions [01:09:15]
Mode collapse strategies (constrastive, clustering, prototypes, self-distillation) [01:09:48]
DINO [01:15:43]
How SSL is different in vision over language [01:18:20]
Dark matter paper and latent predictive models [01:22:05]
Energy Based Models [01:25:56]
Any big lessons learned? [01:28:24]
AVID paper (Video) [01:30:17]
DepthContrast paper (point clouds) [01:33:36]


Shuffle and Learn – https://arxiv.org/abs/1603.08561
DepthContrast – https://arxiv.org/abs/2101.02691
DINO – https://arxiv.org/abs/2104.14294
Barlow Twins – https://arxiv.org/abs/2103.03230
SwAV – https://arxiv.org/abs/2006.09882
PIRL – https://arxiv.org/abs/1912.01991
AVID – https://arxiv.org/abs/2004.12943 (best paper candidate at CVPR’21 (just announced over the weekend) – http://cvpr2021.thecvf.com/node/290)

Alexei (Alyosha) Efros

Exemplar networks

The bitter lesson – Rich Sutton

Machine Teaching: A New Paradigm for Building Machine Learning Systems


Music credit: https://soundcloud.com/unseenmusic/sets/ambient-electronic-1
Visual clips credit: https://www.youtube.com/watch?v=7-4GpL41DIE
(Note MLST is 100% non commercial, non-monetised)

YouTube Source for this AI Video

AI video(s) you might be interested in …