## Poster Session

PosterTitlesAndAbstracts.pdf

#### Dominik Alfke

##### Semi-Supervised Classification on Non-Sparse Graphs Using Low- Rank Graph Convolutional Networks
Graph Convolutional Networks (GCNs) have proven to be successful tools for semi-supervised learning on graph-based datasets. For sparse graphs, linear and polynomial filter functions have yielded impressive results. For large non-sparse graphs, however, network training and evaluation becomes prohibitively expensive. This includes hypergraph applications. By introducing low-rank filters, we gain significant runtime acceleration and simultaneously improved accuracy.

#### Christoph Brauer

##### Asymptotic analysis of unrolling approaches for bi-level optimization
In image and signal processing and beyond, quantities of interest are often reconstructed from noisy measurements by means of suitable convex optimization problems. While model based approaches usually assume that the ingredients of the problem are a priori known, data driven approaches are motivated by situations where the objective function or constraints of the problem are partially unknown and shall be learned from data. This gives rise to bi-level optimization problems in which the original convex problem appears as a lower-level problem in the constraints and a higher-level loss function is minimized in order to fit parameters to available training data. Applying gradient based algorithms to bi-level optimization problems poses the difficulty to differentiate through the argmin of the lower-level problem. In this contribution, we study the approach to unroll a fixed number of steps of the Chambolle-Pock algorithm applied to the lower-level problem in order to accomplish this kind of differentiation approximately. We investigate the asymptotic behavior of the resulting gradients and derive an new approach to learn the parameters of the lower-level problem.

#### James Cheshire

##### On the Thresholding Bandit Problem
We are concerned with a specific pure exploration stochastic bandit problem. The learner samples arms sequentially up to a fixed time horizon and then aims to find the set of arms whose means are above a given threshold. We consider two cases, structured and unstructured. In the structured case one has the additional information that the means form a monotonically increasing sequence. During this poster we first present the problem, then propose an approach in the unstructured case. Finally we consider the difficulties that arise when one wishes to extend to the structured case.

#### Pavel Dvurechensky

##### On the complexity of optimal transport problems
Optimal transport approach has become very popular is data science and machine learning. The applications include, for example, image retrieval, classification and training GANs. Nevertheless, the efficiency in practice comes with the price of coslty computations. In this work we consider theoretical aspects of computational complexity of approximating optimal transport distance and barycenter. Namely, the main question is how many arithmetic operations is needed to approximate these objects with given accuracy. We provide algorithmic upper bounds for the complexity.

#### Martin Genzel

##### A New Perspective on L1-Analysis Recovery
This poster deals with the problem of robust signal estimation from undersampled noisy sub-Gaussian measurements under the assumption of a cosparse model. Based on generalized sparsity parameters, a non-uniform recovery guarantee for the L1-analysis basis pursuit is established, enabling for accurate predictions of its sample complexity. The corresponding bounds on the number of required samples do explicitly depend on the Gram matrix of the analysis operator and therefore particularly take account of its mutual coherence structure. These findings defy conventional wisdom in previous compressed sensing research which suggests that the sparsity of the analysis coefficients is the crucial performance indicator to be studied. In fact, this common conception is not valid in many situations of practical interest, for instance, when using a redundant (multilevel) frame as sparsifying transform. By contrast, it is demonstrated that the proposed theoretical sampling-rate bounds can reliably predict the reconstruction capability of various types of analysis operators, such as redundant Haar wavelets systems, total variation, or random frames. Apart from that, the presented results do naturally extend to stable recovery, which allows for handling signal vectors whose coefficient sequence is only compressible but not exactly sparse.
This is joint work with Gitta Kutyniok (TU Berlin and University of Tromsø) and Maximilian März (TU Berlin).

#### Alexandros Grosdos

##### Algebraic statistics of local Dirac mixture moments
Statistical moments have recently gained attention from an algebraic point of view. When they are polynomials in the model parameters, one can define the moment ideal which is interesting both for statistical inference and for its algebraic properties. This poster focuses on the local Dirac case. We provide generators for the ideal that encodes all relations among moments for the first order Diracs. Symbolic algebra is used to efficiently solve the problem of parameter identifiability for a mixture with few terms. For larger mixtures we turn to Prony and numerical algebra methods. Our results are showcased with examples from signal processing and statistics.
This poster is based on joint work with Markus Wageringel.

#### Pavel Gurevich and Hannes Stuke

##### Dynamical systems approach to outlier robust machine learning
We consider a typical problem of machine learning - the reconstruction of probability distributions of observed spatially distributed data. We introduce the so-called gradient conjugate prior update and study the induced dynamical system. We will explain the dynamics of the parameters and show how one can use insights from the dynamical behavior to recover the ground truth distribution in a way that is robust against outliers. The developed approach is directly applicable to artificial neural networks.
Gurevich P., Stuke H. Gradient conjugate priors and multi-layer neural networks. J. Artificial Intelligence. Accepted. Preprint: arXiv:1802.02643 [math.ST].
Gurevich P., Stuke H. Robustness Against Outliers For Deep Neural Networks By Gradient Conjugate Priors. Preprint: arXiv:1905.08464 [stat.ML].

#### Martin Holler

##### A convex variational model for learning convolutional image atoms from incomplete data
We present and analyze a variational model for learning image descriptors from corrupted and/or incomplete data, both in function space and numerically.
Building on lifting and relaxation strategies, the proposed approach is convex and allows for simultaneous image reconstruction and descriptor learning in a general, inverse problems context.
Further, motivated by an improved numerical performance, we also discuss a semi-convex variant of the proposed approach.
For both settings, fundamental analytical properties allowing in particular to ensure well-posedness and stability results for inverse problems are derived in a continuous setting.
Exploiting convexity, we further show numerical results where globally optimal minimizers of the proposed energy are computed for imaging applications with incomplete, noisy and blurry data.
This is joint work with A. Chambolle and T. Pock.

#### Melanie Kircheis

##### Direct inversion of the NFFT
The NFFT, short hand for nonequispaced fast Fourier transform, is a fast algorithm to evaluate a trigonometric polynomial $f(x) = \sum_{k=-\frac M2}^{\frac M2-1} \hat f_k \, \mathrm e^{2\pi \mathrm i kx}$ at nonequispaced points $$x_j \in \left[-\frac 12,\frac 12\right), \,j=1,\dots,N$$. In case we are given equispaced points and $$M=N$$, this evaluation can be realized by means of the FFT, a fast algorithm that is invertible. Hence, we are interested in an inversion also for nonequispaced data, i.e., the Fourier coefficients $$\hat f_k$$ shall be computed for given function values $$f(x_j)$$. For this purpose, we use the matrix representation of the NFFT to derive a direct algorithm. Thereby, we are able to compute an inverse NFFT by dint of a modified adjoint NFFT. Finally, we show that this approach can also be explained by means of frame approximation.

#### Kirandeep Kour

##### Optimal Feature Extraction using Tensor Train Decomposition for Kernel Methods
There are several applications in science and engineering, where enormous amounts of multi-relational data have been produced. They usually have a complex intrinsic structure. Generally, these data depend on various parameters; hence, they can be interpreted as multidimensional data. Although computational power has been increased drastically over the last decades, a direct treatment, involving such multidimensional data, is still almost impossible due to the curse of dimensionality. This means, the required memory storage for multidimensional data increases exponentially with respect to dimensionality and also the associated computational cost. Tensors (multi-way arrays) can be considered as an essential tool to mitigate the aforementioned issue. They often provide a natural and compact representation for such massive multidimensional data. There has been a significant advancement in the use of tensor decompositions for feature extraction. The decompositions allow us to select necessary features from a large dimensional feature space.
We make use of the tensor train decomposition for building an algorithm for non-separable multidimensional data, which are, in general, not separable by a linear boundary. Therefore, we exploit Kernelized Support Vector Machines, as a base to our approach. To preserve the input data structure, we directly work with tensors as an input. For the classification in a multidimensional case, we propose a method, the so-called Support Tensor Train Machine, by utilizing the tensor train format, thus not restricting ourselves to rank one tensors. We show the robustness and efficiency of the proposed method by means of numerical experiments.

#### Joseph Lam-Weil

##### Local minimax rates for closeness testing of discrete distributions
We consider the closeness testing problem in the Poisson vector model - which is known to be asymptotically equivalent to the model of multinomial distributions. The goal is to distinguish whether two data samples are drawn from the same unspecified distribution, or whether their respective distributions are separated in L1-norm. In this paper, we focus on adapting the rate to the shape of the underlying distributions, i.e. we consider a local minimax setting. We provide, to the best of our knowledge, the first local minimax rate for the separation distance up to logarithmic factors, together with a test that achieves it. In view of the rate, closeness testing turns out to be substantially harder than the related one-sample testing problem over a wide range of cases.

#### Ivan Lecei

##### Anomaly Detection in time series under stochastic external excitations
The use of machine-learning techniques for the Condition Monitoring of mechanical systems (such as wind turbines) for the identification and anticipation of anomalous/erroneous behaviors is a challenging task. On the one hand effects like aging, erosion, or imbalances are influenced by stochastic uncontrolled time dependent excitations due to the wind. They modify the observed signal in such a way that a comparison between different signals with standard distance metrics is not possible. On the other hand for the successful application of machine-learning methods one typically needs a large number of data sets in order to learn a specific model. When trying to learn the indicators for faults in mechanical systems, i.e. anomalies, the presence of a large sample may not be the case, especially when inducing these anomalies artificially may be very costly, as it is for example the case in the wind energy sector.
We address those challenges by firstly using simulations for the creation of synthetic data, which represent the real behavior of the wind turbine as well as possible, resulting in a multidimensional time series including a number of sensors of interest, such as the flap-wise and edge-wise accelerations of the blades. Secondly, based on the simulated sensor data, the underlying physics of the dynamic system can be reconstructed, from which the so called invariants of the system can be identified. They are assumed to undergo transformations that do not change their topology under the effect of time dependent stochastic excitations. Transformations that do modify the topology are associated with anomalies or certain events. An adequate metric for the comparison between different realizations makes use of topological features. Thirdly, we use multivariate statistical methods, such as copulas, where we interpret the change of the dependence between different sensors as a change in the observed system itself. Copulas are invariant under special classes of transformations, making them very suited to address the above mentioned data analysis challenges for mechanical systems.

#### Jan Macdonald and Stephan Waeldchen

##### A Rate-Distortion Framework for Explaining Neural Network Decisions
We propose a rate-distortion framework for explaining neural network decisions. We formulate the task of determining the most relevant signal components for a classifier prediction as an optimisation problem. For the case of binary signals and Boolean classifier functions we show that it is hard to solve and to approximate. Finally, we present a heuristic solution strategy for deep ReLU neural network classifiers. We present numerical experiments and compare our method to other established methods.

#### Johannes Maly

##### Robust Recovery of Low-Rank Matrices with Non-Orthogonal Sparse Decomposition from Incomplete Measurements
We consider the problem of recovering an unknown effectively $$(s_1,s_2)$$-sparse low-rank-$$R$$ matrix $$X$$ with possibly non-orthogonal rank-$$1$$ decomposition from incomplete and inaccurate linear measurements of the form $$y = \mathcal A (X) + \eta$$, where $$\eta$$ is noise. We derive an optimization formulation for matrix recovery under the considered model and propose a novel algorithm to solve it. The algorithm is a fast first order method, and straightforward to implement. We prove global convergence for any linear measurement model to stationary points and local convergence to global minimizers. By adapting the concept of restricted isometry property from compressed sensing to our novel model class, we prove error bounds between global minimizers and ground truth, up to noise level, from a number of subgaussian measurements scaling as $$R(s_1+s_2)$$, up to log-factors in the dimension, and relative-to-diameter distortion. Simulation results demonstrate both the accuracy and efficacy of the algorithm.

#### Robert Malte Polzin

##### Coupling a multiscale stochastic precipitation model to large scale atmospheric flow dynamics
Objective is the efficient modelling of large scales in atmosphere science. We develop a small scale stochastic model for convective activity and describe convective feedback on the large scale atmospheric flow using data-based learning and reduced models for efficient scale coupling. The aim is a hybrid model with a stochastic com-ponent for a conceptual description of convection embedded in a deterministic atmospheric flow model. From the meteorological perspective, we intend to use variants of the Dynamic State Index (DSI) on multiple scales and for multiple processes as new control variables for convective structures. To analyse atmospheric processes on different scales, we need to consider the process as an embedded system as the restriction of an unknown larger dynamical system. Therefore, we extend the theory and algorithms for coherent sets for embedded domains and incomplete trajectory data and make towards unified transfer-operator approach for coherent sets and patterns. The stateof the art considering the work on transport-oriented methods and data-based analytics will be illustrated. In view of upward coupling the future work on a model for cloud characteristics with the help of the already obtained theory for coherent set analysis will be described. Perspectively, we try to combine machine learning with coherent sets in dynamical systems.

#### Nicole Mücke

##### Bridging the gap between Optimization and Learning Theory through Gradient Concentration g
In Optimization, it is well known that the rate of convergence for stochastic gradient descent is guided by structural assumptions on the function to be minimized. For example, smoothness allows a rate of $$1/sqrt(t)$$ while additional strong convexity leads to an improvement to $$1/t$$.
We extend the classical analysis from Optimization to Learning Theory. In particular, we show that fast rates of convergence for SGD can be achieved assuming smoothness and (strong) convexity. The main tool is a concentration argument for gradients of the risk, allowing to derive an upper bound in terms of the localized empirical Rademacher complexity of the associated function class, improving over known results and leading to the same good error bounds as in Optimization. A special feature of this approach is an adaptive stepsize choice.
Moreover, we analyse the interplay between batch-size, multiple passes over the data and stepsize choice. Our theoretical findings are accompanied by numerical studies.

#### Robert Nasdala

##### Transformed rank-1 lattices for high-dimensional approximation
This poster describes an extension of Fourier approximation methods for multivariate functions defined on the torus $$\mathbb{T}^{d}$$ to functions in a weighted Hilbert space $$L_{2}(\mathbb{R}^{d}, \omega)$$ via a multivariate change of variables $$\psi:\left(-\frac{1}{2},\frac{1}{2}\right)^{d}\to\mathbb{R}^{d}$$. We establish sufficient conditions on $$\psi$$ and $$\omega$$ such that the composition of a function in such a weighted Hilbert space with $$\psi$$ yields a function in the Sobolev space $$H_{\mathrm{mix}}^{m}(\mathbb{T}^{d})$$ of functions on the torus with mixed smoothness of natural order $$m \in \mathbb{N}_{0}$$. In this approach we adapt algorithms for the evaluation and reconstruction of multivariate trigonometric polynomials on the torus $$\mathbb{T}^{d}$$ based on single and multiple reconstructing {rank-$$1$$} lattices. Since in applications it may be difficult to estimate a related function space, we make use of dimension incremental construction methods for sparse frequency sets. Various numerical tests confirm obtained theoretical results for the transformed methods.

#### Daniel Potts and Michael Schmischke

##### Learning high-dimensional additive models on the torus
We consider high-dimensional $$1$$-periodic functions $$f \colon \mathbb{T}^d \rightarrow \mathbb{R}$$, where $$d \in \mathbb{N}$$ is the spatial dimension and $$\mathbb{T} \cong \mathbb{R}\setminus\mathbb{Z}$$ is the torus. Any such function which is square-integrable, i.e., $$f \in \mathrm{L}_2(\mathbb{T}^d)$$, can be uniquely represented using the multivariate classical ANOVA (analysis of variance) decomposition $$f = \sum_{\boldsymbol{u}\subseteq\mathcal{D}} f_{\boldsymbol{u}}$$. Here, $$\mathcal{D} = \{ 1,2,\dots,d \}$$ is the set of coordinate indices and $$f_{\boldsymbol{u}} \colon \mathbb{T}^{|\boldsymbol{u}|} \rightarrow \mathbb{R}$$ for $$\boldsymbol{u} \subset \mathcal{D}$$ is an ANOVA term.
We give an overview of an approximation method for a function $$f$$ in two scenarios:
• We have black-box-access to the function $$f$$.
• Only scattered data is available, i.e., a finite node set $$X \subseteq \mathbb{T}^d$$ and evaluations $$\boldsymbol{y} = ( f(\boldsymbol{x}) )_{x \in X}$$.
In both scenarios, we use the truncated ANOVA decomposition $\mathrm{T}_{d_s} f = \sum_{\substack{\boldsymbol{u}\subseteq\mathcal{D} \\ |\boldsymbol{u}| \leq d_s }} f_{\boldsymbol{u}}$ with respect to a superposition dimension $$d_s \in \{1,2,\dots,d-1\}$$ as approximation model and aim to rank the importance of the ANOVA terms $$f_{\boldsymbol{u}}$$, $$|\boldsymbol{u}| \leq d_s$$, in order to construct a frequency index set $$I \subseteq \mathbb{Z}^d$$ for approximation of $$f$$ by the corresponding Fourier partial sum $f(\boldsymbol{x}) \approx \mathrm{T}_{d_s} f (\boldsymbol{x})\approx \sum_{\boldsymbol{k}\in I} \hat{f}_{\boldsymbol{k}} \mathrm{e}^{2\pi\mathrm{i}\boldsymbol{k}\cdot\boldsymbol{x}}.$ We present methods for recovering the Fourier coefficients $$\hat{f}_{\boldsymbol{k}}$$ of $$f$$ in both scenarios. In the black-box case, we rely on rank-1 lattice as sampling schemes and employ the lattice FFT. For scattered data, we iteratively solve a least-squares problem while using the nonequispaced FFT for efficient multiplication with the arising Fourier matrices.

#### Mones Raslan

##### Efficient Approximation of Solutions of Parametric PDEs by ReLU Neural Networks
We analyze the suitability of ReLU neural networks for the approximation of solutions of parametric partial differential equations. Without any concrete knowledge about its shape, we exploit the intrinsic low-dimensionality of the solution manifold established by the theory of reduced bases. To be more precise, for a large variety of parametric PDEs it is possible to construct neural networks that yield approximations of the parameter-dependent maps which avoid a curse of dimension and essentially only depend on the size of the reduced basis. The obtained rates are significantly better than those provided by classical approximation results.

#### Michael Rauchensteiner

##### Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks
We address the structure identification and the uniform approximation of two fully nonlinear layer neural networks from a small number of query samples. We approach the problem by sampling actively finite difference approximations to Hessians of the network. Gathering several approximate Hessians allows reliably to approximate the matrix subspace spanned by symmetric tensors formed by weights of the first layer together with the entangled symmetric tensors, formed by suitable combinations of the weights of the first and second layer. The identification of the 1-rank symmetric tensors is then performed by the solution of a robust nonlinear program. We provide guarantees of stable recovery under a posteriori verifiable conditions. We further address the correct attribution of approximate weights to the first or second layer. By using a suitably adapted gradient descent iteration, it is possible then to estimate, up to intrinsic symmetries, the shifts of the activations functions of the first layer. Our method of identification of the weights of the network is fully constructive, with quantifiable sample complexity, and therefore contributes to dwindle the black-box nature of the network training phase. We corroborate our theoretical results by extensive numerical experiments.

#### Elisabeth Ullmann

##### On multilevel best linear unbiased estimators
We discuss a novel multilevel best linear unbiased estimator (BLUE) in [1]. This is a general variance reduction technique for the estimation of the expectation of a scalar-valued quantity of interest associated with a family of multifidelity models. The key idea is to reformulate the estimation as a linear regression problem. We then show that the associated estimators are variance minimal within the class of linear unbiased estimators. By solving a sample allocation problem we further construct a variance minimal, linear, and unbiased estimator for a given computational budget. We compare our proposed estimator to other multilevel estimators such as multilevel Monte Carlo, multifidelity Monte Carlo, and approximate control variates. In addition, we provide a sharp lower bound for the variance of any linear unbiased multilevel estimator, and show that our estimator approaches this bound in the infinite low-fidelity data limit.
[1] Daniel Schaden, Elisabeth Ullmann: On multilevel best linear unbiased estimators, Preprint No.      IGDK-2019-11, May 2019.

#### Toni Volkmer

##### NFFT meets Krylov methods: Fast matrix-vector prod. for the graph Laplacian of fully connected netw.
The graph Laplacian is a standard tool in data science, machine learning, and image processing. The corresponding matrix inherits the complex structure of the underlying network and is densely populated in certain applications. A typical task is the computation of a number of its eigenvalues and eigenvectors. Standard methods become infeasible as the number of nodes in the graph is too large. We propose to use Krylov subspace methods in combination with a fast summation approach based on the nonequispaced fast Fourier transform (NFFT) to perform dense matrix-vector products with the graph Laplacian in a fast way. The enormous flexibility of the NFFT algorithm allows us to embed the accelerated matrix-vector multiplication into Lanczos-based eigenvalues routines and iterative linear system solvers. We illustrate the feasibility and advantages of our approach on several numerical examples. In particular, we compare our approach with the Nyström method.
This is joint work with Dominik Alfke, Daniel Potts, and Martin Stoll.