061: Interpolation, Extrapolation and Linearisation (Prof. Yann LeCun, Dr. Randall Balestriero)

We are now sponsored by Weights and Biases! Please visit our sponsor link: http://wandb.me/MLST

Yann LeCun thinks that it’s specious to say neural network models are interpolating because in high dimensions, everything is extrapolation. Recently Dr. Randall Bellestrerio, Dr. Jerome Pesente and prof. Yann LeCun released their paper learning in high dimensions always amounts to extrapolation. This discussion has completely changed how we think about neural networks and their behaviour.

[00:00:00] Pre-intro
[00:11:58] Intro Part 1: On linearisation in NNs
[00:28:17] Intro Part 2: On interpolation in NNs
[00:47:45] Intro Part 3: On the curse
[00:57:41] LeCun intro
[00:58:18] Why is it important to distinguish between interpolation and extrapolation?
[01:03:18] Can DL models reason?
[01:06:23] The ability to change your mind
[01:07:59] Interpolation – LeCun steelman argument against NNs
[01:14:11] Should extrapolation be over all dimensions
[01:18:54] On the morphing of MNIST digits, is that interpolation?
[01:20:11] Self-supervised learning
[01:26:06] View on data augmentation
[01:27:42] TangentProp paper with Patrice Simard
[01:29:19] LeCun has no doubt that NNs will be able to perform discrete reasoning
[01:38:44] Discrete vs continous problems?
[01:50:13] Randall introduction
[01:50:13] are the interpolation people barking up the wrong tree?
[01:53:48] Could you steel man the interpolation argument?
[01:56:40] The definition of interpolation
[01:58:33] What if extrapolation was being outside the sample range on every dimension?
[02:01:18] On spurious dimensions and correlations dont an extrapolation make
[02:04:13] Making clock faces interpolative and why DL works at all?
[02:06:59] We discount all the human engineering which has gone into machine learning
[02:08:01] Given the curse, NNs still seem to work remarkably well
[02:10:09] Interpolation doesn’t have to be linear though
[02:12:21] Does this invalidate the manifold hypothesis?
[02:14:41] Are NNs basically compositions of piecewise linear functions?
[02:17:54] How does the predictive architecture affect the structure of the latent?
[02:23:54] Spline theory of deep learning, and the view of NNs as piecewise linear decompositions
[02:29:30] Neural Decision Trees
[02:30:59] Continous vs discrete (Keith’s favourite question!)
[02:36:20] MNIST is in some sense, a harder problem than Imagenet!
[02:45:26] Randall debrief
[02:49:18] LeCun debrief

Pod version: https://anchor.fm/machinelearningstreettalk/episodes/061-Interpolation–Extrapolation-and-Linearisation-Prof–Yann-LeCun–Dr–Randall-Balestriero-e1cgdr0

Our special thanks to;
– Francois Chollet (buy his book! https://www.manning.com/books/deep-learning-with-python-second-edition)
– Alexander Mattick (Zickzack)
– Rob Lange
– Stella Biderman

References:
Learning in High Dimension Always Amounts to Extrapolation [Randall Balestriero, Jerome Pesenti, Yann LeCun]
https://arxiv.org/abs/2110.09485

A Spline Theory of Deep Learning [Dr. Balestriero, baraniuk]
https://proceedings.mlr.press/v80/balestriero18b.html

Neural Decision Trees [Dr. Balestriero]
https://arxiv.org/pdf/1702.07360.pdf

Interpolation of Sparse High-Dimensional Data [Dr. Thomas Lux]
https://tchlux.github.io/papers/tchlux-2020-NUMA.pdf

If you are an old fart and offended by the background music, here is the intro (first 60 mins) with no background music. https://drive.google.com/file/d/16bc7XJjKJzw4YdvL5rYdRZZB19dSzR70/view?usp=sharing

YouTube Source for this AI Video

AI video(s) you might be interested in …