Below we list articles and papers that are related to the mathematical theory of deep learning. If you know of a resource (or are the author of said resource) that we should include, please contact us via the form on our contact page.

title authors journal year tags links
Compressed Sensing using Generative Models
Bora, Jalal, Price, Dimakis
International Conference on Machine Learning (ICML) 2017
  • autoencoders
  • compressed sensing
  • recovery guarantee
  • adversarial networks
  • random matrix theory
  • sparsity
Distributed Sequence Memory of Multidimensional Inputs in Recurrent Networks
Charles, Yin, Rozell
Journal of Machine Learning Research 2017
  • recovery guarantee
  • recurrent neural networks
  • random matrix theory
  • short-term memory
  • echo state networks
  • sparsity
Robust Large Margin Deep Neural Networks
Giryes, Sapiro, Bronstein
IEEE Transactions on Signal Processing 2017
  • classification
  • residual networks
  • generalization
  • clafication margin
  • weight normalization
  • feed forward networks
Tradeoffs between Convergence Speed and Reconstruction Accuracy in Inverse Problems
Giryes, Eldar, Bronstein, Sapiro
(SPARS 2017 proceedings) 2017
  • compressed sensing
  • inverse problems
  • iterative algorithms
  • sparse recovery
  • LISTA
  • l1 minimization
Stable Recovery of the Factors From a Deep Matrix Product and Application to Convolutional Network
Malgouyres, Landsberg
(SPARS 2017 proceedings) 2017
  • compressed sensing
  • convolutional neural networks
  • matrix factorization
  • structured recovery
  • null space property
Generalization and Equilibrium in Generative Adversarial Nets (GANs)
Arora, Ge, Liang, Ma, Zhang
Proceedings of Machine Learning Research (PMLR) 2017
  • adversarial networks
  • generative model
  • game theory
Convolutional Rectifier Networks as Generalized Tensor Decompositions
Cohen, Shashua
(Proceedings of Machine Learning Research 2016) 2016
  • deep vs shallow
  • convolutional neural networks
  • expressive power
  • tensor decompositions
  • rectifier networks
  • arithmetic circuits
  • max pooling
Deep Haar scattering networks
Mallat, Cheng, Chen
Information and Inference: A Journal of the IMA 2016
  • Haar wavelets
  • classification
  • scattering networks
  • convolutional neural networks
Understanding Deep Convolutional Networks
Mallat
Philisophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2016
  • wavelets
  • scattering networks
  • translation invariance
  • deformation stability
  • convolutional neural networks
Deep residual learning for image recognition
He, Zhang, Ren, Sun
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
  • shortcuts
  • residual networks
Provable approximation properties for deep neural networks
Shaham, Cloninger, Coifman
Applied and Computational Harmonic Analysis 2016
  • wavelets
  • approximation theory
Deep Convolutional Neural Networks on Cartoon Functions
Grohs, Wiatowski, Bölcskei
Proceedings of the IEEE International Symposium on Information Theory (ISIT) 2016
  • cartoon functions
  • deformation stability
  • convolutional neural networks
A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction
Wiatowski, Bölcskei
IEEE Transactions on Information Theory (submitted) 2016
  • scattering networks
  • translation invariance
  • deformation stability
  • frame theory
  • feature extraction
  • convolutional neural networks
Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?
Giryes, Sapiro, Bronstein
IEEE Transactions on Signal Processing 2016
  • input space partition
  • classification
  • random weights
  • Gaussian mean width
  • geometric embeddings
Learning Functions: When is Deep Better Than Shallow
Mhaskar, Liao, Poggio
(CBMM Memo) 2016
  • deep vs shallow
  • universal approximation
  • compositional functions
  • approximation theory
  • convolutional neural networks
  • VC dimension
Deep Convolutional Neural Networks Based on Semi-Discrete Frames
Wiatowski, Bölcskei
Proceedings of IEEE International Symposium on Information Theory (ISIT) 2015
  • scattering networks
  • translation invariance
  • deformation stability
  • frame theory
  • feature extraction
  • convolutional neural networks
Norm-Based Capacity Control in Neural Networks
Neyshabur, Tomioka, Srebro
JMLR: Workshop and Conference Proceedings 2015
  • regularization
  • capacity control
On the number of linear regions of deep neural networks
Montufar, Pascanu, Cho, Bengio
Advances in Neural Information Processing Systems 2014
  • input space partition
  • deep vs shallow
Provable Bounds for Learning Some Deep Representations
Arora, Bhaskara, Ge, Ma
(ICML 2014) 2014
  • generative model
  • network learning
Approximation theory of the MLP model in neural networks
Pinkus
Acta Numerica 1999
  • universal approximation
Ridgelets: Theory and Applications
Candès
(Stanford University Dept. of Statistics: Technical report) 1998
  • ridgelets
  • ridgelets are 1-layer neural networks
  • approximation results
Approximation and estimation bounds for artificial neural networks
Barron
Machine Learning 1994
  • universal approximation
Neural networks for localized approximation
Chui, Li, Mhaskar
Mathematics of Computation 1994
  • localized approximation
  • approximation of bump functions
Universal approximation bounds for superpositions of a sigmoidal function
Barron
IEEE Transactions on Information Theory 1993
  • universal approximation
  • estimation
Multilayer feedforward networks with a nonpolynomial activation function can approximate any function
Leshno, Lin, Pinkus, Schocken
Neural Networks 1993
  • universal approximation
  • feed forward networks
Networks for Approximation and Learning
Poggio, Girosi
Proceedings of the IEEE 1990
  • approximation theory
  • splines
  • regularization theory
Approximation by superposition of a sigmoidal function
Cybenko
Mathematics of Control, Signals and Systems 1989
  • universal approximation
Multilayer feedforward networks are universal approximators
Hornik, Stinchcombe, White
Neural Networks 1989
  • universal approximation
  • Stone-Weierstrass theorem
Stable Architectures for Deep Neural Networks
Haber, Ruthotto
(submitted) (2017)
  • gradient descent
  • well-posedness
  • dynamic inverse problems
  • PDE-constrained optimization
  • parameter estimation
  • image classification
On the ability of neural nets to express distributions
Lee, Ge, Ma, Risteski, Arora
(submitted) (2017)
  • Barron's theorem
  • Barron function
  • compositional functions
  • generative model
Optimal Approximation with Sparsely Connected Deep Neural Networks
Bölcskei, Grohs, Kutyniok, Petersen.
(submitted) (2017)
  • universal approximation
  • shearlets
  • cartoon functions
  • information theory
Energy Propagation in Deep Convolutional Neural Networks
Wiatowski, Grohs, Bölcskei
IEEE Transactions on Information Theory (submitted) (2017)
  • scattering networks
  • convolutional neural networks
  • energy propagation
Multi-Layer Convolutional Sparse Modeling: Pursuit and Dictionary Learning
Sulam, Papyan, Romano, Elad
(2017)
  • convolutional neural networks
  • convolutional sparse coding
  • multi-layer pursuit
  • sparse dictionary coding
Gradient Descent Learns Linear Dynamical Systems
Hardt, Ma, Recht
(Journal of Machine Learning Research) (2016)
  • gradient descent
  • linear dynamical systems
  • sample complexity
Convolutional Neural Networks Analyzed via Convolutional Sparse Coding
Papyan, Romano, Elad
Journal of Machine Learning Research (submitted) (2016)
  • convolutional neural networks
  • convolutional sparse coding
  • multi-layer pursuit
  • sparse dictionary coding
Why are Deep Nets Reversible. A Simple Theory, With Implications for Training
Arora, Liang, Ma
(ICLR 2016) (2016)
  • generative model
  • feed forward networks
Understanding deep learning requires rethinking generalization
Zhang, Bengio, Hardt, Recht, Vinyals
(2016)
  • image classification
  • regularization
  • convolutional neural networks
  • generalization
  • stochastic gradient descent
  • training
Understanding Trainable Sparse Coding via Matrix Factorization
Moreau, Bruna
(2016)
  • LISTA
  • matrix factorization
  • iterative thresholding
  • sparse coding matrix
Identity Matters in Deep Learning
Hardt, Ma
(2016)
  • residual networks
  • feed forward networks
  • identity parametrization