#50 Christian Szegedy – Formal Reasoning, Program Synthesis

Dr. Christian Szegedy from Google Research is a deep learning heavyweight. He invented adversarial examples, one of the first object detection algorithms, the inceptionnet architecture, and co-invented batchnorm. He thinks that if you bet on computers and software in 1990 you would have been as right as if you bet on AI now. But he thinks that we have been programming computers the same way since the 1950s and there has been a huge stagnation ever since. Mathematics is the process of taking a fuzzy thought and formalising it. But could we automate that? Could we create a system which will act like a super human mathematician but you can talk to it in natural language? This is what Christian calls autoformalisation. Christian thinks that automating many of the things we do in mathematics is the first step towards software synthesis and building human-level AGI. Mathematics ability is the litmus test for general reasoning ability. Christian has a fascinating take on transformers too.

With Yannic Lightspeed Kilcher and Dr. Mathew Salvaris

Whimsical Canvas with Tim’s Notes:
https://whimsical.com/mar-26th-christian-szegedy-CpgGhnEYDBrDMFoATU6XYC

Pod version: https://anchor.fm/machinelearningstreettalk/episodes/50-Christian-Szegedy—Formal-Reasoning–Program-Synthesis-eu643d

Tim Introducton [00:00:00]
Show Kick-off [00:09:12]
Why did Christian pivot from vision to reasoning? [00:12:07]
Autoformalisation [00:12:47]
Kepler conjecture [00:17:30]
What are the biggest hurdles you have overcome? [00:20:11]
How does something as fuzzy as DL come into mathematical formalism? [00:23:05]
How does AGI connect to autoformalisation? [00:30:32]
Multiagent systems used in autoformalisation? Create an artificial scientific community of AI agents! [00:36:42]
Walid Saba — the information is not in the data [00:41:58]
Is generalization possible with DL? What would Francois say? [00:45:02]
What is going on in a neural network? (Don’t Miss!) [00:47:59]
Inception network [00:52:42]
Transformers negate the need for architecture search? [00:55:58]
What do you do when you get stuck in your research? [00:58:08]
Why do you think SGD is not the path forward? [00:59:59]
Is GPT-3 on the way to AGI [01:02:01]
Is GPT-3 a hashtable or a learnable program canvas? [01:05:01]
What worries Christian about the research landscape? [01:07:14]
The style that research is conducted [01:11:10]
Layerwise self supervised training [01:13:59]
Community Questions: The problem of reality in AI ethics [01:15:33]
Community Questions: Internal covariate shift and BatchNorm [01:20:03]
Community Questions: What is so special about attention? [01:23:08]
Jürgen Schmidhuber [01:24:18]
Community Question: Data efficiency and is it possible to “learn” inductive biases? [01:27:13]
Francois’s ARC challenge, is inductive learning still relevant? [01:31:13]

A Promising Path Towards Autoformalization and General Artificial Intelligence [Szegedy]
https://link.springer.com/chapter/10.1007/978-3-030-53518-6_1

Learning to Reason in Large Theories without Imitation [Bansal/Szegedy]
https://arxiv.org/pdf/1905.10501.pdf

MATHEMATICAL REASONING VIA SELF-SUPERVISED SKIP-TREE TRAINING [Rabe .. Szegedy]
https://openreview.net/pdf?id=YmqAnY0CMEy

LIME: LEARNING INDUCTIVE BIAS FOR PRIMITIVES OF MATHEMATICAL REASONING [Wu..Szegedy]
https://arxiv.org/abs/2101.06223v1

DEEP LEARNING FOR SYMBOLIC MATHEMATICS [Lample]
https://arxiv.org/pdf/1912.01412.pdf

It’s Not What Machines Can Learn, It’s What We Cannot Teach [Yehuda]
https://arxiv.org/pdf/2002.09398.pdf

Investigating the Limitations of Transformers with Simple Arithmetic Tasks [Nogueira]
https://arxiv.org/pdf/2102.13019.pdf

Provable Bounds for Learning Some Deep Representations [Arora]
https://arxiv.org/pdf/1310.6343.pdf

Neural nets learn to program neural nets with fast weights [Schmidhuber]
https://people.idsia.ch/~juergen/fast-weight-programmer-1991-transformer.html

How does Batch Normalization Help Optimization? [Ilyas]
https://gradientscience.org/batchnorm/

How to Train Your ResNet 7: Batch Norm

How to Train Your ResNet 7: Batch Norm

Training a ResNet to 94% Accuracy on CIFAR-10 in 26 Seconds on a Single GPU – [KUHN]
https://efficientdl.com/how-to-train-a-resnet-efficiently/#7-batch-norm-does-reducs-internal-covariate-shift

https://en.wikipedia.org/wiki/HOL_Light
https://en.wikipedia.org/wiki/Coq
https://en.wikipedia.org/wiki/Kepler_conjecture
https://en.wikipedia.org/wiki/Feit%E2%80%93Thompson_theorem

We used a few clips from the ScaleAI interview with Christian – https://www.youtube.com/watch?v=p_UXra-_ORQ

YouTube Source for this AI Video

AI video(s) you might be interested in …