#77 – Vitaliy Chiley (Cerebras)

Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware.

Pod: https://anchor.fm/machinelearningstreettalk/episodes/77—Vitaliy-Chiley-Cerebras-e1k1hvu

[00:00:00] Housekeeping
[00:01:08] Preamble
[00:01:50] Vitaliy Chiley Introduction
[00:03:11] Cerebrus architecture
[00:08:12] Memory management and FLOP utilisation
[00:18:01] Centralised vs decentralised compute architecture
[00:21:12] Sparsity
[00:22:35] Does Sparse NN imply Heterogeneous compute?
[00:28:09] Cost of distributed memory stores?
[00:29:48] Activation vs weight sparsity
[00:36:40] What constitutes a dead weight to be pruned?
[00:36:40] Is it still a saving if we have to choose between weight and activation sparsity?
[00:39:50] Cerebras is a cool place to work
[00:42:53] What is sparsity? Why do we need to start dense?
[00:45:24] Evolutionary algorithms on Cerebras?
[00:46:44] How can we start sparse? Google RIGL
[00:50:32] Inductive priors, why do we need them if we can start sparse?
[00:54:50] Why anthropomorphise inductive priors?
[01:01:01] Could Cerebras run a cyclic computational graph?
[01:02:04] Are NNs locality sensitive hashing tables?

References;
Rigging the Lottery: Making All Tickets Winners [RIGL]
https://arxiv.org/pdf/1911.11134.pdf

[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber’s team, won 4 image recognition challenges prior to AlexNet

[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber’s team, won 4 image recognition challenges prior to AlexNet from MachineLearning

A Spline Theory of Deep Learning [Balestriero]
https://proceedings.mlr.press/v80/balestriero18b.html

YouTube Source for this AI Video

AI video(s) you might be interested in …