# Accepted Papers and Abstracts

**All accepted papers and abstracts are available via OpenReview.**

# Cubature Formulas on Graphs by Means of Sampling and Interpolation

## Isaac Pesenson

Two types of cubature formulas on combinatorial graphs are developed. Cubature formulas of the first type are obtained through interpolation by variational splines. This set of formulas is exact on spaces of variational splines on graphs. Since bandlimited functions can be obtained as limits of variational splines we obtain cubature formulas which are approximately exact on spaces of bandlimited functions. Accuracy of this type of cubature formulas is given in terms of geometry of the set of nodes of splines and in terms of smoothness of functions which is measured by means of the combinatorial Laplace operator. Cubature formulas of the second type are obtained through some sampling theorems for bandlimited functions and relay on existence of certain frames in appropriate subspaces of bandlimited functions. These type formulas are exact on a relevant subspaces of bandlimited functions. The results of the paper have potential applications to problems that arise in data mining.

# Beurling-Selberg Extremization for Dual-Blind Deconvolution Recovery in Joint Radar-Communications

## Jonathan Arley, Monsalve Salazar, Edwin Vargas, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello

Recent interest in integrated sensing and communications has led to the design of novel signal processing techniques to recover information from an overlaid radar-communications signal. Here, we focus on a spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown to the common receiver. In this dual-blind deconvolution (DBD) problem, the receiver admits a multi-carrier wireless communications signal that is overlaid with the radar signal reflected off multiple targets. The communications and radar channels are represented by continuous-valued range-times or delays corresponding to multiple transmission paths and targets, respectively. Prior works addressed recovery of unknown channels and signals in this ill-posed DBD problem through atomic norm minimization but contingent on individual minimum separation conditions for radar and communications channels. In this paper, we provide an optimal joint separation condition using extremal functions from the Beurling-Selberg interpolation theory. Thereafter, we formulate DBD as a low-rank modified Hankel matrix retrieval and solve it via nuclear norm minimization. We estimate the unknown target and communications parameters from the recovered low-rank matrix using the multiple signal classification (MUSIC) method. We show that the joint separation condition also guarantees that the underlying Vandermonde matrix for MUSIC is well-conditioned. Numerical experiments validate our theoretical findings.

# Sampling Informative Positives Pairs in Contrastive Learning

## Melanie Weber, Philip Bachman

Contrastive Learning is a paradigm for learning representation functions that recover useful similarity structure in data based on samples of positive (similar) and negative (dissimilar) instances. Typically, positive instances are sampled by randomly perturbing an anchor point using some form of data augmentation, while negative instances are sampled independently from the same distribution as the anchor points. The goal is to learn a representation function that places each anchor near its positive instances and far from its negative instances. However, not all randomly sampled positive instances are equally effective in learning a representation function that captures useful structure in the data. We consider a setting where class structure in the observed data derives from analogous structure in an unobserved latent space. We propose active sampling approaches for positive instances and investigate their role in effectively learning representation functions which recover the class structure in the underlying latent space.

# Sampling via Generating Functions

## Stephen D. Casey

We develop connections between some of the most powerful theories in analysis, tying the Shannon sampling formula to the Poisson summation formula, Cauchy’s integral and residue formulae, Jacobi interpolation, and Levin’s sine-type functions. The techniques use tools from complex analysis, and in particular, the Cauchy theory and the theory of entire functions, to realize sampling sets $\Lambda$ as zero sets of well-chosen entire functions (sampling set *generating functions*). We then reconstruct the signal from the set of samples using the Cauchy-Jacobi machinery. These methods give us powerful tools for creating a variety of general sampling formulae, e.g., allowing us to derive Shannon sampling and Papoulis generalized sampling via Cauchy theory and sampling in radial domains.

# Sharp convex exact penalty formulations in statistical signal recovery: conditioning, strict complementarity, and algorithms

## Lijun Ding, Alex Wang

We study a sample complexity vs. conditioning tradeoff in modern signal recovery problems where convex optimization problems are built from sampled observations. We begin by introducing a set of condition numbers related to sharpness for a general class of convex optimization problems covering sparse recovery, low-rank matrix sensing, and (abstract) phase retrieval. These condition numbers can be used to control how noisy observations or numerical errors in the solution process affect the accuracy of the recovered signal. We develop a first-order method for solving such problems with a nearly dimension-independent linear convergence rate (even in the presence of small or sparse noise). The condition numbers we introduce show up explicitly in the rate of linear convergence in our new algorithm. In each of these settings, we show that the condition numbers approach constants only a small factor above the statistical threshold.

# On the Extension and Sampling Theorems for the Coupled Fractional Fourier Transform

## Ahmed I Zayed

In 1980 a fractional version of the Fourier transform was introduced by V. Namias, but did not received much attention until the early 1990s when it was found to have numerous applications in optics and time-frequency representation. The fractional Fourier transform, denoted by $F_\eta ,$ depends on a parameter $0\leq \theta \leq \pi/2,$ so that when $\theta=0,$ $F_0$ is the identity transformation and when $\theta=\pi/2,$ $F_{\pi/2}$ is the standard Fourier transform. The transform has been extended to higher dimensions by taking tensor products of one-dimensional transforms. In 2018 the author of this article introduced a novel generalization of the fractional Fourier transform to two dimensions, which is called the coupled fractional Fourier transform and is denoted by $F_{\alpha, \theta}.$ This transform depends on two independent angles $\alpha$ and $\theta,$ with $ 0\leq \alpha, \eta \leq \pi/2,$ so that $F_{0,0}$ is the identity transformation and $F_{\pi/2, \pi/2},$ is the two-dimensional Fourier transform. For other values of $\alpha$ and $\theta ,$ we obtain other interesting configurations of the transform. One immediate application of this transform is in time-frequency representation because of its close relationship to the Wigner distribution function. In this talk we will discuss sampling theorems and extensions of this transform to spaces of generalized functions.

# Scattering Networks on Simplicial Complexes Using Multiscale Basis Dictionaries

## Naoki Saito, Stefan C Schonsheck, Eugene Shvarts

We discuss our new scattering networks for signals on simplicial complexes. Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the $\kappa$-GHWT and $\kappa$-HGLET, which we recently developed for simplices of dimension $\kappa \in \mathbb{N}$ in a given simplicial complex by generalizing the node-based Generalized Haar-Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). The $\kappa$-GHWT and the $\kappa$-HGLET both form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Then, our new scattering network cascades the moments (up to fourth order) of the modulus of the dictionary coefficients within the basis dictionaries followed by the local averaging process. Consequently, the resulting features are robust to perturbations of input signals and invariant w.r.t.\ node permutations, i.e., they are effective for classifying signals recorded on $\kappa$-simplices of a given simplicial complex. We will demonstrate such effectiveness in document type classification using the Science News database. More precisely, using the word2vec method, we generate the nodes corresponding to a selected set of words (terms) in this database and form a $k$-nearest neighbor graph, which is viewed as a simplicial complex. Such a graph clearly contains $\kappa$-simplices, e.g., triangles ($\kappa=2$), tetrahedra ($\kappa=3$), etc., which correspond to combinations of $\kappa+1$ words. Each document leads to a signal for $\kappa$-simplices of this graph. That is, the value of a node is the frequency of occurrence of the word corresponding to that node ($\kappa=0$) while the value of a $\kappa$-simplex is the sum of the values of the $\kappa+1$ nodes that form that $\kappa$-simplex. In this way, a signal on $\kappa$-simplices reflects the frequency of mutual occurrences of $\kappa+1$ words, which may better characterize the nature of the corresponding document. We also discuss the similarities and differences between our proposed method and the previously-proposed methods such as the Deep Haar Scattering Networks of Cheng et al., Geometric Scattering Networks of Gao et al., and Hodgelets of Roddenberry et al.

# The Littlewood problem and non-harmonic Fourier series

## Philippe Jaming, Karim Kellay, Chadi Saba

The aim of this note is to prove a lower bound of the $L^1$-norm of non-harmonic trigonometric polynomials of the form $$ C\sum_{j=1}^N\dfrac{|a_j|}{j}\leq \frac{1}{T}\int_{-T/2}^{T/2}igg|\sum_{j=1}^Na_je^{2i\pi\lambda_jt}\igg|\mbox{d}t, $$ where $T\geq 72$, the constant $C$ is universal, $\lambda_j$ are real numbers with $\lambda_{j+1}-\lambda_j\geq 1$ and $a_j$ are complex numbers. This extends to the non-harmonic case the Littlewood conjecture as solved by McGehee, Pigno, Smith (MPS) and Konyagin (Kon) and was previously obtained by Nazarov (Naz) for $T>1$ but with a non-explicit constant $C=C(T)$ that depends on $T$.

# Machine Learning on Large-Scale Graphs: Graphon NNs and Learning by Transference

## Luana Ruiz

Graph neural networks (GNNs) are successful at learning representations from most types of network data but suffer from limitations in large graphs, which do not have the Euclidean structure that time and image signals have in the limit. Yet, large graphs can often be identified as being similar to each other in the sense that they share structural properties. Indeed, graphs can be grouped in families converging to a common graph limit – the graphon. A graphon is a bounded symmetric kernel which can be interpreted as both a random graph model and a limit object of a convergent sequence of graphs. Graphs sampled from a graphon almost surely share structural properties in the limit, which implies that graphons describe families of similar graphs. We can thus expect that processing data supported on graphs associated with the same graphon should yield similar results. In this work, I formalize this intuition by showing that the error made when transferring a GNN across two graphs in a graphon family is small when the graphs are sufficiently large, a property called transferability. I then extend this result to show that in scenarios where the training graph is still too large to achieve a small transferability error, the transferability property can be leveraged in the model training through an algorithm that learns the GNN on a sequence of growing graphs. Under mild assumptions, the GNN learned by this algorithm converges to the solution of the empirical risk minimization problem on the graphon in the limit. This enables large-scale graph machine learning via transference: training GNNs on (sequences of) moderate-scale graphs and executing them on large-scale graphs.

# Translation invariant diagonal frame decomposition for the Radon transform

## Simon Göppel, Markus Haltmeier, Jürgen Frikel

In this article, we address the challenge of solving the ill-posed reconstruction problem in computed tomography using a translation invariant diagonal frame decomposition (TI-DFD). First, we review the concept of a TI-DFD for general linear operators and the corresponding filter-based regularization concept. We then introduce the TI-DFD for the Radon transform on $L^2(\R^2)$ and provide an exemplary construction using the TI wavelet transform. Presented numerical results clearly demonstrate the benefits of our approach over non-translation invariant counterparts.

# Quasi-analytic directional wavelet packets: applications to image processing

## Amir Averbuch, Valery Zheludev

Recently, a versatile library of quasi-analytic complex-valued wavelet packets (WPs) which originate from splines of arbitrary orders, was designed (azn_pswq1). The real parts of the quasi-analytic WPs (qWPs) are the regular spline-based orthonormal WPs. The imaginary parts, which are slightly modified Hilbert transforms of the real parts, are the so-called complementary orthonormal WPs, which, unlike the symmetric regular WPs, are antisymmetric. Both regular and complementary WPs are well localized in time domain and their DFT spectra provide a variety of refined splits of the frequency domain. The waveforms can have arbitrary number of vanishing moments. Tensor products of 1D quasi-analytic WPs (qWPs) provide a diversity of 2D waveforms oriented in multiple directions. The designed computational scheme enables us to get fast and easy implementation of the qWP transforms. The shapes of real and imaginary parts of the qWPs can be regarded as directional cosine waves with different frequencies modulated by localized low-frequency signals. For example, the set of the fourth-level WPs comprises waveforms which are oriented in 314 different directions and are oscillating with 256 different frequencies. Various combinations of qWPs form multiple frames in the 2D signal space. The combination of the exceptional properties of the designed qWPs, such as unlimited directionality and oscillating structure of the waveforms, vanishing moments and refined frequency resolution, make them a powerful tool for image processing applications. The algorithms based on the qWPs proved to be competitive with the best existing methods in solving such classical image processing problems as denoising, inpainting and deblurring. The qWP algorithms are especially efficient for capturing edges and fine texture and oscillating patterns even in severely degraded images. Due to the above properties and next to unlimited diversity of testing waveforms, the qWPs have strong capabilities for extraction characteristic features from signals and images, which are utilized in the image classification algorithms in conjunction with Support Vector Machines and Convolutional Neural Networks.

# Modulation Spaces and the Curse of Dimensionality

## Rahul Parhi, Michael Unser

We investigate the $L^2$-error of approximating functions in the modulation spaces $M^s_{1,1}(\mathbb{R}^d)$, $s \geq 0$, by linear combinations of Wilson bases elements. We analyze a nonlinear method for approximating functions in $M^s_{1,1}(\mathbb{R}^d)$ with $N$-terms from a Wilson basis. Its $L^2$-approximation error decays at a rate of $N^{- rac{1}{2} - rac{s}{2d}}$. We show that this rate is optimal by proving a matching lower bound. Remarkably, these rates do not grow with the input dimension $d$. Finally, we show that the best linear $L^2$-approximation error cannot decay faster than $N^{- rac{s}{2d}}$. This shows that linear methods, contrary to the nonlinear ones, necessarily suffer the curse of dimensionality in these spaces.

# New Properties of Clifford Prolate Spheroidal Wave Functions

## Hamed Baghal Ghaffari, Jeffrey Andrew Hogan, Joseph D Lakey

We review a recent construction of Clifford prolate spheroidal wave functions (CPSWFs) – a multidimensional multichannel generalization of one-dimensional PSWFs. We introduce new properties of the corresponding eigenvalues such as decay and spectral accumulation.

# A Convergence Rate for Manifold Neural Networks

## Joyce Chew, Deanna Needell, Michael Perlmutter

High-dimensional data arises in numerous applications, and the rapidly developing field of geometric deep learning seeks to develop neural network architectures to analyze such data in non-Euclidean domains, such as graphs and manifolds. Recent work has proposed a method for constructing manifold neural networks using the spectral decomposition of the Laplace-Beltrami operator. Moreover, in this work, the authors provide a numerical scheme for implementing such neural networks when the manifold is unknown and one only has access to finitely many sample points. They show that this scheme, which relies upon building a data-driven graph, converges to the continuum limit as the number of sample points tends to infinity. Here, we build upon this result by establishing a rate of convergence that depends on the intrinsic dimension of the manifold but is independent of the ambient dimension. We also discuss how the rate of convergence depends on the depth of the network and the number of filters used in each layer.

# ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization

## Javier Antonio Salazar Cavazos, Jeffrey A Fessler, Laura Balzano

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the sample-wise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the low-rank structure of the data. This is done without distributional assumptions of the low-rank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH

# Symmetric Perceptron with Random Labels

## Eren C. Kizildag, Tanay Wakhare

The symmetric binary perceptron (SBP) is a random constraint satisfaction problem (CSP) and a single-layer neural network; it exhibits intriguing features, most notably a sharp phase transition regarding the existence of satisfying solutions. In this paper, we propose two novel generalizations of the SBP by incorporating random labels. Our proposals admit a natural machine learning interpretation: any satisfying solution to the random CSP is a minimizer of a certain empirical risk. We establish that the expected number of solutions for both models undergoes a sharp phase transition and calculate the location of this transition, which corresponds to the annealed capacity in statistical physics. We then establish a universality result: the location of this transition does not depend on the underlying distribution. We conjecture that both models in fact exhibit an even stronger phase transition akin to the SBP and give rigorous evidence towards this conjecture through the second moment method.

# New and Overlooked Questions Arising from Analysis of the Geometry of Molecular Conformations in Cryo-EM

## Roy R Lederman

Cryo-Electron Microscopy (cryo-EM) is an imaging technology revolutionizing structural biology. One of the great promises of cryo-EM is to study mixtures of conformations of molecules. We will discuss recent approaches to analyzing the manifold of conformations of flexible macromolecules. We will discuss some arguably surprising technical difficulties likely to impact other applications.

# Sampling and Frame Expansions for UWB Signals

## Stephen D. Casey, Peter Balazs

This article outlines sampling and frame expansions designed specifically for ultra-wideband (UWB) signals. We present two different types of expansions: ON windows with modified Gegenbauers and generalized frame expansions. They represent a contrast in computability and flexibility of frames versus the rigid structure and precision in the sampling expansion. Both approaches rely on the multiplication of some super-windowing functions and given orthonormal sequences. We generalize this approach to the redundant case, investigating windowed families of frame sequences.

# Concentration of low spectrum information in certain replacement graphs

## Joseph D Lakey, Jeffrey A Hogan

We study the problem of spatial concentration of vertex functions in the low spectrum of certain replacement graphs obtained by replacing each vertex on a cycle with a hypercube $B_N$. It is shown that for most values of the bandlimit, one can identify a set of vectors that is complete in the corresponding Paley–Wiener space on the graph and are concentrated on a single copy of $B_N$ in the sense that at least half of the $\ell^2$-norm of each such vector is associated with a single copy of $B_N$. Extent to which the techniques should extend to more general replacement products is discussed.

# Fourier Interpolation with Magnitude Only

## Qijia Jiang

In this paper we prove that all even Schwartz functions $f\colon \mathbb{R} ightarrow \mathbb{R}$ are uniquely determined by their values at $\ert \hat{f}(\pm\sqrt{n})ert ,f(\pm\sqrt{n/2})}$ for $n\geq 0$ indexing the non-negative integers.

# Scalable symmetric Tucker tensor decomposition

## Ruhui Jin

In this talk, we will study the best low-rank Tucker decomposition of symmetric tensors. The main motivation is in decomposing higher-order multivariate moments, which has crucial applications in data science. We will show the scalable adaptations of the projected gradient descent (PGD) and the higher-order eigenvalue decomposition (HOEVD) methods to decompose sample moment tensors. With the help of implicit and streaming techniques, we can evade the overhead cost of building and storing the moment tensor. Such reductions make computing the Tucker decomposition realizable for large data instances in high dimensions. We will also demonstrate the efficiency of the algorithms and the applicability to real-world datasets. For convergence guarantee, the update sequence derived by the PGD solver achieves first and second-order criticality.

# Diffusion-Based Sampling for Deep Active Learning

## Dan Kushnir, luca Venturi

The remarkable performance of deep neural networks depends on the availability of massive labeled training data. To alleviate the load of data annotation with labels, deep active learning aims to sample a minimal set of training points to be labelled which yields maximal model accuracy. We propose an efficient sampling criterion to sample data for annotation, which automatically shifts from an exploration type of sampling to a class-decision-boundary refinement. Our criterion relies on a process of diffusing the existing label information over a graph constructed from the hidden representation of the data. This graph representation captures the intrinsic geometry of the approximated labeling function. We analyze our sampling criterion and its exploration - refinement transition in light of the eigen-spectrum of the diffusion operator. Additionally, we provide a comprehensive sample complexity analysis that captures the two phases of exploration and refinement. The diffusion-based sampling criterion is shown to be advantageous over state-of-the-art criteria for deep active learning on synthetic and real benchmark data.

# A non-backtracking method for long matrix completion

## Yizhe Zhu

We consider the problem of rectangular matrix completion in the regime where the matrix $M$ of size $n \times m$ is long, i.e., the aspect ratio $m/n$ diverges to infinity. Such matrices are of particular interest in the study of tensor completion, where they arise from the unfolding of a low-rank tensor. In the case where the sampling probability is $\frac{d}{\sqrt{mn}}$, we propose a new algorithm for recovering the singular values and left singular vectors of the original matrix based on a variant of the standard non-backtracking operator of a suitably defined bipartite graph. We show that when $d$ is above a Kesten-Stigum-type sampling threshold, our algorithm recovers a correlated version of the singular value decomposition of $M$ with quantifiable error bounds.

# Denoising on Sphere via Large Spherical $t$-designs and Spherical Framelets

## Xiaosheng Zhuang, Yuchen Xiao

In this paper, we investigate the spherical $t$-designs with large value of $t$ for function approximation, construction of spherical framelets, and the important task of spherical signal processing. Based on the spherical framelet systems and the fast framelet transform algorithms, we propose an effective denoising scheme for spherical signal denoising that utilizes the nice properties of spherical $t$-designs with large $t$ value. We provide numerical results of signal/image denoising on several data sets.

# Learning to Reconstruct Signals From Binary Measurements

## Julián Tachella, Laurent Jacques

Recent advances in unsupervised learning have highlighted the possibility of learning to reconstruct signals from noisy and incomplete linear measurements alone. These methods play a key role in medical and scientific imaging and sensing, where ground truth data is often scarce or difficult to obtain. However, in practice measurements are not only noisy and incomplete but also quantized. Here we explore the extreme case of learning from binary observations and provide necessary and sufficient conditions on the number of measurements required for identifying a set of signals from incomplete binary data. Our results are complementary to existing bounds on signal recovery from binary measurements. Furthermore, we introduce a novel self-supervised learning approach that only requires binary data for training. We demonstrate in a series of experiments with real datasets that our approach is on par with supervised learning and outperforms sparse reconstruction methods with a fixed wavelet basis by a large margin.

# Supervised Manifold Learning via Random Forest Geometry-Preserving Proximities

## Jake Slater Rhodes

Manifold learning approaches seek the intrinsic, low-dimensional data structure within a high-dimensional space. Mainstream manifold learning algorithms, such as Isomap, UMAP, $t$-SNE, Diffusion Map, and Laplacian Eigenmaps do not use data labels and are thus considered unsupervised. Existing supervised extensions of these methods are limited to classification problems and fall short of uncovering meaningful embeddings due to their construction using order non-preserving, class-conditional distances. In this paper, we show the weaknesses of class-conditional manifold learning quantitatively and visually and propose an alternate choice of kernel for supervised dimensionality reduction using a data-geometry-preserving variant of random forest proximities as an initialization for manifold learning methods. We show that local structure preservation using these proximities is near universal across manifold learning approaches and global structure is properly maintained using diffusion-based algorithms.

# Algorithms and Theory for Quantizing Neural Networks

## Rayan Saab

Deep neural networks (DNNs) have emerged as a popular and effective tool in a wide range of applications, including computer vision and natural language processing. However, their high memory, computational, and power requirements have prompted the development of model compression techniques that reduce these costs. Among these techniques is neural network quantization, which has gained significant attention as a means of reducing the memory and computational requirements of DNNs without compromising their accuracy. Nevertheless, most existing approaches to quantization of DNNs are ad-hoc and lack rigorous performance guarantees. We present data-driven post-training quantization algorithms that can be applied directly to already trained networks. In this context, we discuss a stochastic quantization technique and provide rigorous theoretical guarantees on its performance, even in the setting of multi-layer networks, showing that it can achieve high accuracy in the over-parametrized regime.

# Pseudo-inversion of integration-based time encoding using POCS

## Nguyen T. Thao, Dominik Rzepka, Marek Miskowicz

While event-based sampling allows the use of sampling circuits of higher precision and lower power consumption, it faces the difficult problem of signal reconstruction from generalized nonuniform samples. An ideal solution to this problem is to perform the pseudo-inversion of the linear operator that maps the input signals into the sequences of samples. We show in this article that this is possible with all time-encoding schemes based on input integration, using the method of projection onto convex sets (POCS). This includes multi-channel time encoding.

# Outer Kernel Theorem for Co-orbit Spaces of Localised Frames

## Dimitri Bytchenkoff, Michael Speckbacher, Peter Balazs

We prove a so-called outer kernel theorem for bounded linear operators between co-orbit spaces generated from a localised frame $\Psi$. In particular, we show that there is a bijective correspondence between the bounded linear operators mapping the co-orbit space of test functions $H^1_w(\Psi)}$ to the co-orbit space of distributions $H^\infty_{1/w}(\Psi)}$ and their kernels in $H^{infty, \otimes}_{1/w}(\Psi)}$. The proof of the theorem relies on general properties of localised frames, tensor products and Galerkin’s method for matrix represention of operators.

# Disentangled Latent Representations of Images with Atomic Autoencoders

## Alasdair Newson, Yann Traonmilin

We present the atomic autoencoder architecture, which decomposes an image as the sum of elementary parts that are parametrized by simple separate blocks of latent codes. We show that this simple architecture is induced by the definition of a general atomic low-dimensional model of the considered data. We also highlight the fact that the atomic autoencoder achieves disentangled low-dimensional representations under minimal hypotheses. Experiments show that their implementation with deep neural networks is successful at learning disentangled representations on two different examples: images constructed with simple parametric curves and images of filtered off-the-grid spikes.

# Nonlinear Approximation with Subsampled Rank-1 Lattices

## Felix Bartel, Fabian Taubert

In this paper we approximate high-dimensional functions $f\colon\mathbb T^d o\mathbb C$ by sparse trigonometric polynomials based on function evaluations. Recently it was shown that a dimension-incremental sparse Fourier transform (SFT) approach does not require the signal to be exactly sparse and is applicable in this setting. We combine this approach with subsampling techniques for rank-1 lattices. This way our approach benefits from the underlying structure in the sampling points making fast Fourier algorithms applicable whilst achieving the good sampling complexity of random points (logarithmic oversampling). In our analysis we show detection guarantees of the frequencies corresponding to the Fourier coefficients of largest magnitude. In numerical experiments we make a comparison to full rank-1 lattices and uniformly random points to confirm our findings.

# Graph Laplacian Learning with Exponential Family Noise

## Changhao Shi, Gal Mishne

Graph signal processing (GSP) has generalized classical Fourier analysis to signals lying on irregular structures such as networks. However, a common challenge in applying graph signal processing is that the underlying graph of a system is unknown. Well-established methods optimize a graph representation, usually the graph adjacency matrix or the graph Laplacian, so that the total variation of given signals will be minimal on the learned graph. These methods have been developed for continuous graph signals, however inferring the graph structure for other types of data, such as discrete counts or binary signal, remains underexplored. In this talk, I will address the problem of learning the graph Laplacian from noisy data and generalize a GSP framework for learning a graph from smooth graph signals to exponential family noise distributions, allowing for the modeling of various data types. We then propose Graph Learning with Exponential family Noise (GLEN), an alternating algorithm that estimates the underlying graph Laplacian as well as the unobserved smooth representation from the noisy signals. Furthermore, we extend GLEN to the time-vertex setting to handle data with temporal correlations, e.g., time-series on a network. We demonstrate in synthetic experiments with different graph models that GLEN outperforms competing Laplacian estimation methods under noise model mismatch. Furthermore, we apply GLEN to different types of real-world data (e.g., binary questionnaires, neural activity) to further demonstrate the efficacy of our methods.

# Sampling theorems with derivatives in shift-invariant spaces generated by exponential B-splines

## Karlheinz Gröchenig, Irina Shafkulovska

We derive sufficient conditions for sampling with derivatives in shift-invariant spaces generated by an exponential B-spline. The sufficient conditions are expressed by a new notion of measuring the gap between consecutive points. As a consequence, we can construct sampling sets arbitrarily close to necessary conditions.

# Fast Dual-Graph Regularized Background Foreground Separation

## Longxiu Huang, Jing Qin

Foreground-background separation is a crucial task in various applications such as computer vision, robotics, and surveillance. Robust Principal Component Analysis (RPCA) is a popular method for this task, which considers the static background as the low-rank component and the moving objects in the foreground as the sparse component. To enhance the performance of RPCA, graph regularization is typically used to incorporate the sophisticated geometry of the background and temporal correlation. However, handling the graph Laplacians can be challenging due to the substantial number of data points. In this study, we propose a novel dual-graph regularized foreground-background separation model based on Sobolev smoothness. Our model is solved using a fast numerical algorithm based on the matrix CUR decomposition. Experimental results on real datasets demonstrate that our proposed algorithm achieves state-of-the-art computational efficiency.

# Pseudo Clifford Bandpass Prolates

## Jeffrey Andrew Hogan, Joseph D Lakey

We introduce operators which generalise the classical modulation and translation operators, now acting on functions defined on ${\mathbb R}^m$ and taking values in the associated Clifford algebra ${\mathbb C}_m$. The modulation operators are used to map orthonormal bases for Paley-Wiener spaces associated with balls in ${\mathbb R}^m$ to orthonormal sets in Paley-Wiener spaces $PW_A$ associated with annuli $A$ in ${\mathbb R}^m$. The complementary spaces are characterised and an orthonormal basis for them is given. These bases are used to construct an orthonormal basis for $PW_A$ composed of pseudo bandpass prolates.

# Efficient recovery of non-periodic multivariate functions from few scattered samples

## Nicolas Nagel, Felix Bartel, Tino Ullrich, Kai Lüttgen

It has been observed by several authors that well-known periodization strategies like tent or Chebychev transforms lead to remarkable results for the recovery of multivariate functions from few samples. So far, theoretical guarantees are missing. The goal of this paper is twofold. On the one hand, we give such guarantees and briefly describe the difficulties of the involved proof. On the other hand, we combine these periodization strategies with recent novel constructive methods for the efficient subsampling of finite frames in $\mathbb{C}^m$. As a result we are able to reconstruct non-periodic multivariate functions from very few samples. The used sampling nodes are the result of a two-step procedure. Firstly, a random draw with respect to the Chebychev measure provides an initial node set. A further sparsification technique selects a significantly smaller subset of these nodes with equal approximation properties. This set of sampling nodes scales linearly in the dimension of the subspace on which we project and works universally for the whole class of functions. The method is based on principles developed by Batson, Spielman, and Srivastava and can be numerically implemented. Samples on these nodes are then used in a (plain) least-squares sampling recovery step on a suitable hyperbolic cross subspace of functions resulting in a near-optimal behavior of the sampling error. Numerical experiments indicate the applicability of our results.

# Rotated time-frequency lattices are sets of stable sampling for continuous wavelet systems

## Nicki Holighaus, Günther Koliander

We provide an example for the generating matrix $A$ of a two-dimensional lattice $\Gamma = A\mathbb{Z}^2$, such that the following holds: For any sufficiently smooth and localized mother wavelet $\psi$, there is a constant $eta(A,\psi)>0$, such that $eta\Gamma\cap (\mathbb{R} imes \mathbb{R}^+)$ is a stable set of sampling for the wavelet system generated by $\psi$, for all $0<eta\leq eta(A,\psi)$. The result and choice of generating matrix are loosely inspired by the studies of low discrepancy sequences and uniform distribution modulo $1$. In particular, we estimate the number of lattice points contained in any axis parallel rectangle of fixed area. This estimate is combined with a recent sampling result for continuous wavelet systems, obtained via the oscillation method of general coorbit theory.

# Adversarial training through the lens of optimal transport

## Nicolas Garcia Trillos

Modern machine learning methods, in particular deep learning approaches, have enjoyed unparalleled success in a variety of challenging application fields like image recognition, medical image reconstruction, and natural language processing. While a vast majority of previous research in machine learning mainly focused on constructing and understanding models with high predictive power, consensus has emerged that other properties like stability and robustness of models are of equal importance and in many applications are essential. This has motivated researchers to investigate the problem of adversarial training —or how to make models robust to adversarial attacks— but despite the development of several computational strategies for adversarial training and some theoretical development in the broader distributionally robust optimization literature, there are still several theoretical questions about it that remain relatively unexplored. In this talk, I will take an analytical perspective on the adversarial robustness problem and explore two questions: 1) Can we use analytical tools to find lower bounds for adversarial robustness problems?, and 2) How do we use modern tools from analysis and geometry to solve adversarial robustness problems? In this talk I will showcase how ideas from optimal transport theory can provide answers to these questions. This talks is based on joint works with Camilo Andrés García Trillos, Matt Jacobs, and Jakwang Kim.

# Frames by Iterations and Invariant Subspaces

## Alejandra Aguilera, Carlos Cabrelli, Diana Carbajal, Victoria Paternostro

This paper presents a characterization of systems of iterations that generate frames of abstract separable Hilbert spaces. The characterization is achieved through a correspondence with a canonical system of iterations that form Parseval frames of certain subspaces of the space of vector-valued functions $L^{2}(\mathbb{T},\mathcal{K})$, where $\mathcal{K}$ is a Hardy space with multiplicity. These subspaces possess the property of being invariant under two shift operators with multiplicity. Furthermore, we provide a clear description of the subspaces generated by these canonical systems of iterations.

# Sampling theorems in spaces of variable bandwidth generated via Wilson basis

## Beatrice Andreolli, Karlheinz Gröchenig

We propose a new sampling theorem, the complete reconstruction of a function from its samples, for the space of variable bandwidth constructed using Wilson expansion. The theorem is based on the maximal gap between consecutive points and it relates the lower sampling rate to the bandwidths that have an influence on the reconstruction on each particular interval of the signal.

# Greedy-type sparse recovery from heavy-tailed measurements

## Tim Fuchs, Felix Krahmer, Richard Kueng

Recovering a $s$-sparse signal vector $x\in\mathbb{C}^n$ from a comparably small number of measurements $y:=(Ax)\in\mathbb{C}^m$ is the underlying challenge of compressed sensing. By now, a variety of efficient greedy algorithms has been established and strong recovery guarantees have been proven for random measurement matrices $A\in\mathbb{C}^{m imes n}$. However, they require a strong concentration of $A^* Ax$ around its mean $x$ (in particular, the Restricted Isometry Property), which is generally not fulfilled for heavy-tailed matrices. In order to overcome this issue and even cover applications where only limited knowledge about the distribution of the measurements matrix is known, we suggest substituting $A^* Ax$ by a median-of-means estimator. In the following, we present an adapted greedy algorithm, based on median-of-means, and prove that it can recover any $s$-sparse unit vector $x\in\mathbb{C}^n$ up to a $l_2$-error $|x-\hat{x}|_2<\epsilon$ with high probability, while only requiring a bound on the fourth moment of the entries of $A$. The sample complexity is of the order $\mathcal{O}(s\log (n\log( rac{1}{\epsilon}))\log( rac{1}{\epsilon}))$.

# Sampling strategies for compressive imaging under statistical noise

## Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer, Holger Rauhut, Marion Menzel

Most of the compressive sensing literature in signal processing assumes that the noise present in the measurement has an adversarial nature, i.e., it is bounded in a certain norm. At the same time, the randomization introduced in the sampling scheme usually assumes an i.i.d. model where rows are sampled with replacement. In this case, if a sample is measured a second time, it does not add additional information. For many applications, where the statistical noise model is a more accurate one, this is not true anymore since a second noisy sample comes with an independent realization of the noise, so there is a fundamental difference between sampling with and without replacement. Therefore, a more careful analysis must be performed. In this short note, we illustrate how one can mathematically transition between these two noise models. This transition gives rise to a weighted LASSO reconstruction method for sampling without replacement, which numerically improves the solution of high-dimensional compressive imaging problems.

# Iterative reconstruction of bandlimited signals from nonuniform samples by sliding periodization of nonuniformity

## Nguyen T. Thao, Dominik Rzepka, Marek Miskowicz

The present article proposes to reconstruct a bandlimited signal from nonuniform samples by a sliding periodization of the nonuniformity and successive approximations. When the nonuniformity consists of bounded deviations of the sampling instants from a uniform grid, the reconstruction can be made arbitrarily accurate either by increasing the period of the periodizations or by increasing the number of successive approximations.

# Optimal density compensation factors for the reconstruction of the Fourier transform of bandlimited functions

## Melanie Kircheis, Daniel Potts

An inverse nonequispaced fast Fourier transform (iNFFT) is a fast algorithm to compute the Fourier coefficients of a trigonometric polynomial from nonequispaced sampling data. However, various applications such as magnetic resonance imaging (MRI) are concerned with the analogous problem for bandlimited functions, i.e., the reconstruction of point evaluations of the Fourier transform from given measurements of the bandlimited function. In this paper, we review an approach yielding exact reconstruction for trigonometric polynomials up to a certain degree, and extend this technique to the setting of bandlimited functions. Here we especially focus on methods computing a diagonal matrix of weights needed for sampling density compensation.

# ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization

## Javier Antonio Salazar Cavazos, Jeffrey A Fessler, Laura Balzano

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the sample-wise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the low-rank structure of the data. This is done without distributional assumptions of the low-rank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH

# Efficient uniform approximation using Random Vector Functional Link networks

## Olov Schavemaker, Palina Salanevich

A Random Vector Functional Link (RVFL) network is a depth-2 neural network with random inner weights and biases. As only the outer weights of such architectures need to be learned, the learning process boils down to a linear optimization task, allowing one to sidestep the pitfalls of nonconvex optimization problems. In this paper, we prove that an RVFL with ReLU activation functions can approximate Lipschitz continuous functions provided its hidden layer is exponentially wide in the input dimension. Although it has been established before that such approximation can be achieved in $L_2$ sense, we prove it for $L_\infty$ approximation error and Gaussian inner weights. To the best of our knowledge, our result is the first of this kind. We give a nonasymptotic lower bound for the number of hidden layer nodes, depending on, among other things, the Lipschitz constant of the target function, the desired accuracy, and the input dimension. Our method of proof is rooted in probability theory and harmonic analysis.

# First-Order Algorithms for Optimization over Graph Laplacians

## Tyler Maunu

When solving an optimization problem over the set of graph Laplacian matrices, one must deal with the large number of constraints as well as the large objective variable size. In this paper we explore first order methods for optimization over graph Laplacian matrices. These methods include two popular methods for constrained optimization: the mirror descent algorithm and the Frank-Wolfe (conditional gradient) algorithm. We derive efficiently implementable formulations of these algorithms over graph Laplacians, and use existing theory to show their iteration complexity in various regimes. Experiments demonstrate the efficiency of these methods over alternatives like interior point methods.

# On Graph Uncertainty Principle and Eigenvector Delocalization

## Elizaveta Rebrova, Palina Salanevich

Uncertainty principles present an important theoretical tool in signal processing, as they provide limits on the time-frequency concentration of a signal. In many real-world applications the signal domain has a complicated irregular structure that can be described by a graph. In this paper, we focus on the global uncertainty principle on graphs and propose new connections between the uncertainty bound for graph signals and graph eigenvectors delocalization. We also derive uncertainty bounds for random d-regular graphs and provide numerically efficient upper and lower approximations for the uncertainty bound on an arbitrary graph.

# Exploiting data geometry in (matrix-valued) optimization

## Melanie Weber

Many applications involve non-Euclidean data and exploiting such geometric structure in optimization can deliver algorithms that are computationally superior to standard nonlinear programming approaches. This observation has resulted in an increasing interest in Riemannian methods in the optimization and machine learning communities. In this talk, we consider the problem of optimizing Riemannian “difference of convex” functions. Several classical optimization problems involve objective functions of this form, including matrix-valued tasks, such as barycenter problems, determinantal point processes, as well as operator scaling. The latter is of central importance in many areas of Mathematics and Computer Science through connections to wide range of classical tools, including maximum likelihood estimators in Gaussian models, Tyler’s M-estimator of scatter matrices and Brascamp-Lieb constants, among others. We discuss a class of convex-concave procedures for optimizing Riemannian “difference of convex” functions, where the geometric structure of the problem gives rise to an efficiently solvable fixed-point iteration. We present a detailed convergence analysis for the proposed algorithms and provide numerical evidence for their performance. This talk is based on joint work with Suvrit Sra.

# Sparse wavelet-based solutions for the M/EEG inverse problem

## Samy Mokhtari, Bruno Torresani, Christian-George BENAR, Jean-Michel BADIER

This paper is concerned with variational and Bayesian approaches to neuro-electromagnetic inverse problems (EEG and MEG). The strong indeterminacy of these problems is tackled by introducing sparsity inducing regularization/priors in a transformed domain, namely a spatial wavelet domain. Sparsity in the wavelet domain allows to reach ”data compression” in the cortical sources domain. Spatial wavelets defined on the mesh graph of the triangulated cortical surface are used, in combination with sparse regression techniques, namely LASSO regression or sparse Bayesian learning, to provide localized and compressed estimates for brain activity from sensor data. Numerical results on simulated and real MEG data are provided, which outline the performances of the proposed approach in terms of localization.

# A Continuous Transform for Localized Ridgelets

## Joseph Shenouda, Rahul Parhi, Robert D Nowak

We develop a new continuous wavelet-like transform for localized ridgelets. Contrary to the classical ridgelets (which are not local), this new dictionary exhibits desirable decay so that all the atoms lie in $L^2(\mathbb{R}^d)$. Furthermore, each localized ridgelet atom is itself a superposition of continuously many classical ridgelets. Our construction hinges on a careful wavelet analysis in the Radon domain, different than the usual Radon-domain wavelet analysis found in the study of classical ridgelets. This is crucial in ensuring the locality of our new, localized ridgelet atoms. We prove a continuous transform and inversion formula for this new dictionary. Finally, due to the locality of these atoms, we conjecture that this new dictionary is better conditioned than the system of non-local ridge functions used ubiquitously in modern neural networks.

# Spectral gap-based deterministic tensor completion

## Kameron Decker Harris, Oscar Lopez, Angus Read, Yizhe Zhu

Tensor completion is a core machine learning algorithm used in recommender systems and other domains with missing data. While the matrix case is well-understood, theoretical results for tensor problems are limited, particularly when the sampling patterns are deterministic. Here we bound the generalization error of the solutions of two tensor completion methods, Poisson loss and atomic norm minimization, providing tighter bounds in terms of the target tensor rank. If the ground-truth tensor is order $t$ with CP-rank $r$, the dependence on $r$ is improved from $r^{2(t-1)(t^2-t-1)}$ in arXiv:1910.10692 to $r^{2(t-1)(3t-5)}$. The error in our bounds is deterministically controlled by the spectral gap of the sampling sparsity pattern. We also prove several new properties for the atomic tensor norm, reducing the rank dependence from $r^{3t-3}$ in arXiv:1711.04965 to $r^{3t-5}$ under random sampling schemes. A limitation is that atomic norm minimization, while theoretically interesting, leads to inefficient algorithms. However, numerical experiments illustrate the dependence of the reconstruction error on the spectral gap for the practical max-quasinorm, ridge penalty, and Poisson loss minimization algorithms. This view through the spectral gap is a promising window for further study of tensor algorithms.

# Characterizing the Spectrum of the NTK via a Power Series Expansion

## Guido Montufar

Under mild conditions on the network initialization we derive a power series expansion for the Neural Tangent Kernel (NTK) of arbitrarily deep feedforward networks in the infinite width limit. We provide expressions for the coefficients of this power series which depend on both the Hermite coefficients of the activation function as well as the depth of the network. We observe faster decay of the Hermite coefficients leads to faster decay in the NTK coefficients and explore the role of depth. Using this series, first we relate the effective rank of the NTK to the effective rank of the input-data Gram. Second, for data drawn uniformly on the sphere we study the eigenvalues of the NTK, analyzing the impact of the choice of activation function. Finally, for generic data and activation functions with sufficiently fast Hermite coefficient decay, we derive an asymptotic upper bound on the spectrum of the NTK. This is work with Michael Murray, Hui Jin, Benjamin Bowman.

# On Manifold Learning in Plato’s Cave: Remarks on Manifold Learning and Physical Phenomena

## Roy R Lederman, Bogdan Toader

Many techniques in machine learning attempt explicitly or implicitly to infer a low-dimensional manifold structure of an underlying physical phenomenon from measurements without an explicit model of the phenomenon or the measurement apparatus. This paper presents a cautionary tale regarding the discrepancy between the geometry of measurements and the geometry of the underlying phenomenon in a benign setting. The deformation in the metric illustrated in this paper is mathematically straightforward and unavoidable in the general case, and it is only one of several similar effects. While this is not always problematic, we provide an example of an arguably standard and harmless data processing procedure where this effect leads to an incorrect answer to a seemingly simple question. Although we focus on manifold learning, these issues apply broadly to dimensionality reduction and unsupervised learning.

# Largest Angle Path Distance for Multi-Manifold Clustering

## Haoyu Chen, Anna Little, Akil Narayan

We propose a novel, angle-based path metric for the multi-manifold clustering problem. This metric, which we call the largest-angle path distance (LAPD), is computed as a bottleneck path distance in a graph constructed on $d$-simplices of data points. When data is sampled from a collection of $d$-dimensional manifolds which may intersect, the method can cluster the manifolds with high accuracy and automatically detect how many manifolds are present. By leveraging fast approximation schemes for bottleneck distance, this method exhibits quasi-linear computational complexity in the number of data points. In addition to being highly scalable, the method outperforms existing algorithms in numerous numerical experiments on intersecting manifolds, and exhibits robustness with respect to noise and curvature in the data.

# Graph Fourier MMD for Signals on Graphs

## Sam Leone, Alexander Tong, Guillaume Huguet, Guy Wolf, Smita Krishnaswamy, Aarthi Venkat

While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little attention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph single cell data, particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel distance between distributions and signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called gene localization score which helps select genes for cellular state space characterization.

# Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution

## Nicolas Loizou

In this talk, we study the dynamics and the convergence properties of SGD equipped with new variants of the stochastic Polyak stepsize (SPS) and provide solutions to the two main drawbacks of the original SPS. That is, (i) it requires a priori knowledge of the optimal mini-batch losses, which are not available when the interpolation condition is not satisfied (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. We first show that a simple modification of the original SPS that uses lower bounds instead of the optimal function values can directly solve issue (i). On the other hand, solving issue (ii) turns out to be more challenging and leads us to valuable insights into the method’s behavior. We show that if interpolation is not satisfied, the correlation between SPS and stochastic gradients introduces a bias, which effectively distorts the expectation of the gradient signal near minimizers, leading to non-convergence - even if the stepsize is scaled down during training. To fix this issue, we propose DecSPS, a novel modification of SPS, which guarantees convergence to the exact minimizer - without a priori knowledge of the problem parameters. For strongly-convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.

# Modeling Changes in Molecular Dynamics Time Series as Wasserstein Barycentric Interpolations

## Jovan Damjanovic, Yu-Shan Lin, James M. Murphy

Molecular dynamics (MD) simulations are a powerful computational tool for elucidation of molecular behavior. These simulations generate an abundance of high-dimensional time series data and parsing these data into a human-interpretable format is nontrivial. Clustering trajectory segments obtained via change point detection has been shown to lower memory complexity and yield improved partitioning resolution of the time series compared to the state of the art. However, accurate change point placement is often inhibited by the presence of gradual changes between long-lived metastable states. The trajectory regions corresponding to these gradual changes are not well-modeled by a single distribution, and therefore are frequently over-segmented. In this work, we model such regions using weighted Wasserstein barycentric interpolations between adjacent metastable states, allowing for gradual changes to be resolved correctly. The improved detection performance of our proposed method is demonstrated on a range of toy and real MD simulation data, showing significant potential for faithfully modeling and compressing complex MD simulations.

# Quantization of bandlimited functions using random samples

## Rohan Joy, Felix Krahmer, Alessandro Lupoli, Radha Ramakrishnan

We investigate the compatibility of distributed noise-shaping quantization with random samples of bandlimited functions. Let $f$ be a real-valued $\pi$-bandlimited function. Suppose $R>1$ is a real number, and assume that ${x_i}*{i=1}^m$ is a sequence of i.i.d random variables uniformly distributed on $[- ilde{R}, ilde{R}]$, where $ ilde{R}>R$ is appropriately chosen. We show that on using a distributed noise-shaping quantizer to quantize the values of $f$ at ${x_i}*{i=1}^m$, a function $f^{\sharp}$ can be reconstructed from these quantized values such that $|f-f^{\sharp}|_{L^2[-R, R]}$ decays with high probability as $m$ and $ ilde{R}$ increase.

# Low-rank and Smooth Tensor Recovery on Cartesian Product Graphs

## Mert Indibi, Selin Aviyente

Data science research has found great success with algorithms that leverage the structure of the topological space that the high-dimensional data lies on. In particular, low-rank tensor models which represent low-dimensional latent factors in a succinct and parsimonious way have become indispensable tools. These low-rank models have been utilized in a variety of applications including tensor completion from corrupted or missing entries. In the standard tensor completion problem, the different modes of the tensor are assumed to be completely independent of each other. However, in many real-world problems such as those involving spatio-temporal data, there exist relationships between the different modes. This information can be encoded in terms of graphs which can bring additional structure to the tensor completion problem. In this paper, we introduce methods for structured tensor completion where both the low-rank and smoothness of tensor are incorporated into the optimization problem. In particular, we model tensor data as graph signals on Cartesian product graphs and use the Dirichlet energy to quantify the smoothness of tensor data with respect to the graph. We evaluate the performance of this tensor recovery approach for different types of data, i.e. low-rank, smooth and low-rank plus smooth, and compare with existing methods.

# Analysis of Stochastic Gradient Descent for Learning Linear Neural Networks

## Gabin Maxime Nguegnang, Bubacarr Bah, Holger Rauhut, Ulrich Terstiege

In this work we analyze stochastic gradient descent (SGD) for learning deep linear neural networks. We use an analytical approach that combines SGD iterates and gradient flow trajectories base on stochastic approximation theory. Then establish the almost sure boundedness of SGD iterates and its convergence guarantee for learning deep linear neural networks. Most studies on the analysis of SGD for nonconvex problem have entirely focused on convergence property which only indicate that the second moment of the loss function gradient tend to zero. Our study demonstrates the convergence of SGD to a critical point of the square loss almost surely for learning deep linear neural networks.

# Digital Halftoning via Mixed-Order Weighted Sigma-Delta Modulation

## Anna Veselovska, Felix Krahmer

In this paper, we propose 1-bit weighted Sigma-Delta quantization schemes of mixed order as a technique for digital halftoning. These schemes combine weighted Sigma-Delta schemes of different orders for two-dimensional signals so one can profit both from the better stability properties of low order schemes and better accuracy properties of higher order schemes. We demonstrate that the resulting mixed-order Sigma-Delta schemes in combination with a padding strategy yield improved representation quality in digital halftoning as measured in the Feature Similarity Index. These empirical results are complemented by mathematical error bounds for the model of two-dimensional bandlimited signals as motivated by a mathematical model of human visual perception.

# Manifold Alignment with Label Information

## Andres Felipe Duque Correa, Myriam Lizotte, Guy Wolf, Kevin R. Moon

Multi-domain data is becoming increasingly common and presents both challenges and opportunities in the data science community. The integration of distinct data-views can be used for exploratory data analysis, and benefit downstream analysis including machine learning related tasks. With this in mind, we present a novel manifold alignment method called MALI (Manifold alignment with label information) that learns a correspondence between two distinct domains. MALI belongs to a middle ground between the more commonly addressed semi-supervised manifold alignment, where some known correspondences between the two domains are assumed to be known beforehand, and the purely unsupervised case, where no information linking both domains is available. To do this, MALI learns the manifold structure in both domains via a diffusion process and then leverages discrete class labels to guide the alignment. MALI recovers a pairing and a common representation that reveals related samples in both domains. We show that MALI outperforms the current state-of-the-art manifold alignment methods across multiple datasets.

# Injectivity of Multi-Window Gabor Phase Retrieval

## Palina Salanevich

In many signal processing problems arising in practical applications, we wish to reconstruct an unknown signal from its phaseless measurements with respect to a frame. This inverse problem is known as the phase retrieval problem. For each particular application, the set of relevant measurement frames is determined by the problem at hand, which motivates the study of phase retrieval for structured, application-relevant frames. In this paper, we focus on one class of such frames that appear naturally in diffraction imaging, ptychography, and audio processing, namely, multi-window Gabor frames. We study the question of injectivity of the phase retrieval problem with these measurement frames in the finite-dimensional setup and propose an explicit construction of an infinite family of phase retrievable multi-window Gabor frames. We show that phase retrievability for the constructed frames can be achieved with a much smaller number of phaseless measurements compared to the previous results for this type of measurement frames. Additionally, we show that the sufficient for reconstruction number of phaseless measurements depends on the dimension of the signal space, and not on the ambient dimension of the problem.

# Model-Driven Quantization for Time Encoding Machines

## Dorian Florescu

In conventional analog-to-digital (ADC) conversion, the sampling and quantization steps take place on the time and amplitude axes, respectively. In the case of time encoding machines (TEMs), which convert analog signals into a sequence of time events, sampling and quantization interfere with one another since they both operate on the time axis. Here we introduce a new quantization method for TEMs called QTEM that, due to its model-driven nature, limits the interference of sampling and quantization. We show that existing recovery guarantees don’t apply to QTEM. We provide new guarantees for recovering the input of the QTEM and demonstrate numerically its advantage over conventional TEM quantization.

# Nonparametric Estimation of Local Bandwidth from Level Crossings Sampling

## Miroslaw Pawlak

We study the problem of a local bandwidth recovery for nonstationary stochastic signals when the measured information is given in terms of level crossings. We propose a kernel estimate of the local bandwidth from samples generated from level crossings of stochastic signals being the time-warped version of stationary Gaussian processes. The positivity and bandlimited nature of the local intensity is captured by the properly selected class of bandlimited and positive kernel functions. The asymptotic properties of the estimator are derived.

# Relationships between the Phase Retrieval Problem and Permutation Invariant Embeddings

## Radu Balan, Efstratios Tsoukanis

This paper discusses the connection between the phase retrieval problem and permutation invariant embeddings. We show that the real phase retrieval problem for $\R^d/O(1)$ is equivalent to Euclidean embeddings of the quotient space $\R^{2 imes d}/S_2$ performed by the sorting encoder introduced in an earlier work. In addition, this relationship provides us with inversion algorithms of the orbits induced by the group of permutation matrices.

# On the Optimal Recovery of Graph Signals

## Simon Foucart, Chunyang Liao, Nate Veldt

Learning a smooth graph signal from partially observed data is a well-studied task in graph-based machine learning. We consider this task from the perspective of optimal recovery, a mathematical framework for learning a function from observational data that adopts a worst-case perspective tied to model assumptions on the function to be learned. Earlier work in the optimal recovery literature has shown that minimizing a regularized objective produces optimal solutions for a general class of problems, but did not fully identify the regularization parameter. Our main contribution provides a way to compute regularization parameters that are optimal or near-optimal (depending on the setting), specifically for graph signal processing problems. Our results offer a new interpretation for classical optimization techniques in graph-based learning and also come with new insights for hyperparameter selection. We illustrate the potential of our methods in numerical experiments on several semi-synthetic graph signal processing datasets.

# Spectral Universality of Regularized Linear Regression with Nearly Deterministic Sensing Matrices

## Rishabh Dudeja, Subhabrata Sen, Yue Lu

Spectral universality refers to the empirical observation that asymptotic properties of a high-dimensional stochastic system driven by a structured random matrix are often determined only by the spectrum (or singular values) of the underlying matrix - the singular vectors are irrelevant provided they are sufficiently ``generic’’. Consequently, the properties of the underlying system can be accurately predicted by analyzing the system under the mathematically convenient assumption that the singular vectors as uniformly random (or Haar distributed) orthogonal matrices. This general phenomenon has been observed in numerous contexts, including statistical physics, communication systems, signal processing, statistics, and randomized numerical linear algebra. We study this universality phenomenon in the context of high-dimensional linear regression, where the goal is to estimate an unknown signal vector from noisy linear measurements specified using a sensing matrix. We prove a spectral universality principle for the performance of convex regularized least squares (RLS) estimators for this problem. Our contributions are two-fold: (1) We introduce a notion of a universality class for sensing matrices, defined through nearly deterministic conditions that fix the spectrum of the matrix and formalize the heuristic notion of generic singular vectors; (2) We show that for all sensing matrices in the same universality class, the dynamics of the proximal gradient algorithm for the regression problem, and the performance of RLS estimators themselves (under additional strong convexity conditions) are asymptotically identical. In addition to including i.i.d. Gaussian and rotational invariant matrices as special cases, our universality class also contains highly structured, strongly dependent, and even nearly deterministic matrices. Examples include randomly signed incoherent tight frames and randomly subsampled Hadamard transforms. Due to this universality result, the performance of RLS estimators on many structured sensing matrices with limited randomness can be characterized using the rotationally invariant sensing model with uniformly random (or Haar distributed) singular vectors as an equivalent yet mathematically tractable surrogate.

# On the Impact of Sample Size in Reconstructing Graph Signals

## Baskaran Sripathmanathan, Xiaowen Dong, Michael M. Bronstein

Reconstructing a signal on a graph from observations on a subset of the vertices is a fundamental problem in the field of graph signal processing. It is often assumed that adding additional observations to an observation set will reduce the expected reconstruction error. We show that under the setting of noisy observation and least-squares reconstruction this is not always the case, characterising the behaviour both theoretically and experimentally.

# Time Encoding of Sparse Signals with Flexible Filters

## Dorian Florescu, Ayush Bhandari

In this work, we consider the problem of recovering a sparse signal consisting of a sum of filtered spikes from the output of a time encoding machine (TEM). This problem was addressed before with recovery methods designed for filters with specific shapes, mostly relying on Prony’s method for recovery. Here we propose a new recovery method for sparse inputs from TEM samples. Compared to existing approaches, the new method relaxes significantly the assumption on the filters. The method is associated by a theoretically guaranteed algorithm. We provide numerical examples to evaluate the new method, including an example with filters that are not compatible with previous methods.

# Tensor Sandwich: Tensor Completion for Low CP-Rank Tensors via Adaptive Random Sampling

## Cullen Haselby, Santhosh Karnik, Mark Iwen

We propose an adaptive and provably accurate tensor completion approach based on combining matrix completion techniques for a small number of slices with a modified noise robust version of Jennrich’s algorithm. In the simplest case, this leads to a sampling strategy that more densely samples two outer slices (the bread), and then more sparsely samples additional inner slices (the bbq-braised tofu) for the final completion. Under mild assumptions on the factor matrices, this algorithm with high probability completes an $n imes n imes n$ tensor with CP-rank $r$ and uses at most $\mathcal{O}(nr\log^2 r)$ adaptively chosen samples. Empirical experiments further verify that the proposed approach works well in practice, including as a low-rank approximation method in the presence of additive noise.

# Randomized Trace Estimation

## Tyler Chen

Suppose $A$ is a square matrix and $x$ is a random vector with mean zero and identity covariance. Then, $x^TAx$ forms an unbiased estimator for the trace of $A$. Since $x^TAx$ can be computed using only matrix-vector products with $A$, this simple observation allows us to estimate the trace of matrices for which an explicit representation is unknown or intractable to write down explicitly. Estimators of this form have proven to be a powerful algorithmic tool since their emergence in the late 1980s and early 1990s, and remain an important tool for domain problems in fields such as statistics, uncertainty quantification, machine learning, and computational quantum physics, etc. Recent years have seen a rapid growth in the development of randomized algorithms for trace estimation. Current state of the art algorithms are often based on the estimator described above, but offer provable theoretical improvements which are easily observed in practice. This talk provides an overview of recent developments in stochastic trace estimation techniques, with a particular focus on methods for estimating the trace of matrix functions such as the matrix exponential or matrix inverse.

# Second-Order Beurling Approximations and Super-Resolution from Bandlimited Functions

## Maxime Ferreira Da Costa

The Beurling–Selberg extremal approximation problems are classics in functional analysis and have found applications in numerous areas of mathematics. Of particular interest, optimal solutions to those problems can be exploited to provide sharp bounds on the condition number of Vandermonde matrices with nodes on the unit circle, which is of great interest to many inverse problems, including super-resolution. However, those solutions have non-derivable Fourier transforms, which impedes their use in a stability analysis of the super-resolution problem. We propose novel second-order extensions to Beurling–Selberg problems, where the approximation residual to functions of bounded variation (BV) is constrained to faster decay rates in the asymptotic, ensuring the smoothness of their Fourier transforms. We harness the properties of those higher-order approximants by establishing a link between the norms of the residuals and the minimal eigenvalue of the Fisher information matrix (FIM) of the super-resolution problem. This enables the derivation of a simple and computable minimal resolvable distance for the super-resolution problem, depending only on the properties of the point-spread function, above which stability can be guaranteed.

# Uncovering the limits of uniqueness in sampled Gabor phase retrieval: A dense set of counterexamples in $L^2(\mathbb{R})$

## Rima Alaifari, Francesca Bartolucci, Matthias Wellershoff

Sampled Gabor phase retrieval — the problem of recovering a square-integrable signal from the magnitude of its Gabor transform sampled on a lattice — is a fundamental problem in signal processing, with important applications in areas such as imaging and audio processing. Recently, a classification of square-integrable signals which are not phase retrievable from Gabor measurements on parallel lines has been presented. This classification was used to exhibit a family of counterexamples to uniqueness in sampled Gabor phase retrieval. Here, we show that the set of counterexamples to uniqueness in sampled Gabor phase retrieval is dense in $L^2(\mathbb{R})$, but is not equal to the whole of $L^2(\mathbb{R})$ in general. Overall, our work contributes to a better understanding of the fundamental limits of sampled Gabor phase retrieval.

# Quantization of Bandlimited Graph Signals

## Felix Krahmer, He Lyu, Rayan Saab, Anna Veselovska, Rongrong Wang

—Graph models and graph-based signals are becoming increasingly important in machine learning, natural sciences, and modern signal processing. In this paper, we address the problem of quantizing bandlimited graph signals. We introduce two classes of noise-shaping algorithms for graph signals that differ in their sampling methodologies. We demonstrate that these algorithms can be efficiently used to construct quantized representatives of bandlimited graph-based signals with bounded amplitude. Moreover, for one of the algorithms, we provide theoretical guarantees on the relative error between the quantized representative and the true signal.

# Data Imputation with an Autoencoder and MAGIC

## Devin Eddington, Andres Felipe Duque Correa, Guy Wolf, Kevin R. Moon

Missing data is a common problem in many applications. Imputing missing values is a challenging task, as the imputations need to be accurate and robust to avoid introducing bias in downstream analysis. In this paper, we propose an ensemble method that combines the strengths of a manifold learning-based imputation method called MAGIC and an autoencoder deep learning model. We call our method Deep MAGIC. Deep MAGIC is trained on a linear combination of the mean squared error of the original data and the mean squared error of the MAGIC-imputed data. Experimental results on three benchmark datasets show that Deep MAGIC outperforms several state-of-the-art imputation methods, demonstrating its effectiveness and robustness in handling large amounts of missing data.

# Unlimited Sampling via One-Bit Quantization

## Arian Eamaz, Kumar Vijay Mishra, Farhang Yeganegi, Mojtaba Soltanalian

Shannon’s sampling theorem plays a central role in the discrete-time processing of bandlimited signals. However, the infinite precision assumed by Shannon’s theorem is impractical because of the ADC clipping effect that limits the signal’s dynamic range. Moreover, the power consumption of an analog-to-digital converter (ADC) increases linearly with the sampling frequency and may be prohibitively high for a wide bandwidth signal. Recently, unlimited and one-bit sampling frameworks have been proposed to address these shortcomings. The former is a high-resolution technique that employs self-reset ADCs to achieve an unlimited dynamic range. The latter achieves relatively low cost and reduced power consumption at an elevated sampling rate. In this paper, we examine jointly exploiting the appealing attributes of both techniques. We propose unlimited one-bit (UNO) sampling, which entails a judicious design of one-bit sampling thresholds. This enables storing the distance between the input signal value and the threshold. We then utilize this information to accurately reconstruct the signal from its one-bit samples via a randomized Kaczmarz algorithm (RKA) which is considered to be a strong linear feasibility solver that selects a random linear equation in each iteration. The numerical results illustrate the effectiveness of RKA-based UNO over the state-of-the-art.

# Near-optimality of $\Sigma\Delta$ quantization for $L^2$-approximation with polynomials in Bernstein form

## Sinan Gunturk, Weilin Li

In this paper, we provide lower bounds on the $L^2$-error of approximation of arbitrary functions $f: [0,1] o \mathbb{R}$ by polynomials of degree at most $n$, with the constraint that the coefficients of these polynomials in the Bernstein basis of order $n$ are bounded by $n^lpha$ for some $lpha \geq 0$. For Lipschitz functions, our lower bound matches, up to a factor of $\sqrt{\log n}$, our previously obtained upper bound for the error of approximation by one-bit polynomials in Bernstein form via $\Sigma\Delta$ quantization where the functions are bounded by $1$ and the coefficients of the approximating polynomials are constrained to be in ${\pm 1}$.

# One-Bit Matrix Completion With Time-Varying Sampling Thresholds

## Arian Eamaz, Farhang Yeganegi, Mojtaba Soltanalian

We explore the impact of coarse quantization on matrix completion in the extreme scenario of generalized one-bit sampling, where the matrix entries are compared with time-varying threshold levels. In particular, instead of observing a subset of high-resolution entries of a low-rank matrix, we have access to a small number of one-bit samples, generated as a result of these comparisons. To recover the low-rank matrix from its highly-quantized known entries, we first formulate the one-bit matrix completion problem with time-varying thresholds as a nuclear norm minimization problem, with one-bit sampled information manifested as linear inequality feasibility constraints. We then modify the popular singular value thresholding (SVT) algorithm to accommodate these inequality constraints, resulting in the creation of the One-Bit SVT (OB-SVT). Our findings demonstrate that incorporating multiple time-varying sampling threshold sequences in one-bit matrix completion can significantly improve the performance of the matrix completion algorithm. We perform numerical evaluations comparing our proposed algorithm with the maximum likelihood estimation method previously employed for one-bit matrix completion, and demonstrate that our approach can achieve a better recovery performance.

# Regularized Nonnegative CP Tensor Decomposition for Detecting Topics at Different Time Scales

## Lara Kassab, Hanbaek Lyu, Elizaveta Rebrova, Deanna Needell, Alona Kryshchenko, Jiahong Yuan, Denali Molitor

Temporal text data (such as news articles or social media feeds) often consists of a mixture of long-lasting trends and popular but short-lasting topics of interest. A truly successful topic modeling strategy should be able to detect both types of topics and clearly locate them in time. In this paper, we show that tensor-based methods are able to discover topics of variable persistence automatically. Moreover, we present techniques to control the length of discovered topics, using data prepossessing or regularization using Hoyer projections applied to the temporal components of the data tensor. We demonstrate our methods on semi-synthetic and real-world news headlines data.

# Stability Analysis of Resolving Pulses of Unknown Shape from Compressive Fourier Measurements

## Meghna Kalra, Kiryung Lee

We present the problem of resolving pulses of a common shape from noisy Fourier measurements. Specifically, the paper focuses on the challenge of estimating the locations when the pulse shape is unknown. We leverage compressed sensing techniques to implement a larger virtual aperture through random sampling, thereby avoiding the necessity to acquire measurements inversely proportional to minimum separation. Although a larger aperture is beneficial, it introduces a penalty when dealing with unknown non-trivial pulse shapes. We provide theoretical and numerical results to quantify the error and show the efficacy of this approach in accurately resolving pulse locations with acceptable error in the presence of noise and modeling error.

# Uncertainty Quantification in Unsupervised PINNs: A Tractable Baseline Model

## Olga Graf, Pablo Flores, Pavlos Protopapas, Karim Pichara

Physics-Informed Neural Networks (PINNs) are gaining popularity as a method for solving differential equations. While being more feasible in some contexts than classical numerical techniques, PINNs still lack credibility. A remedy for that can be found in Uncertainty Quantification (UQ), which is just beginning to emerge in the context of PINNs. A comprehensive methodology is still to be developed, especially in the unsupervised learning setting. In this work, we propose a simple, analytically tractable model for UQ in unsupervised PINNs that incorporates the discrepancy between the PINN solution and the true unknown solution. We exploit recent results on error bounds for PINNs on linear dynamical systems and demonstrate the predictive uncertainty on a class of linear ODEs.

# Revisiting Block-Diagonal SDP Relaxations for the Clique Number of the Paley Graphs

## Vladimir A Kobzar, Krishnan Mody

This work addresses the block-diagonal semidefinite program (SDP) relaxations for the clique number of the Paley graphs. The size of the maximal clique (clique number) of a graph is a classic NP-complete problem; a Paley graph is a deterministic graph where two vertices are connected if their difference is a quadratic residue modulo certain prime powers. Improving the upper bound for the Paley graph clique number for odd prime powers is an open problem in combinatorics. Moreover, since quadratic residues exhibit pseudorandom properties, Paley graphs are related to the construction of deterministic restricted isometries, an open problem in compressed sensing and sparse recovery. Recent work provides numerical evidence that the current upper bounds can be improved by the sum-of-squares (SOS) relaxations. In particular, the bounds given by the SOS relaxations of degree 4 (SOS-4) have been empirically observed to be growing at an order smaller than square root of the prime. However, computations of SOS-4 appear to be intractable with respect to large graphs. Gvozdenovic et al. introduced a more computationally efficient block-diagonal hierarchy of SDPs that refines the SOS hierarchy. They computed the values of these SDPs of degrees 2 and 3 (L2 and L3 respectively) for the Paley graph clique numbers associated with primes p less or equal to 809. These values bound from above the values of the corresponding SOS-4 and SOS-6 relaxations respectively. We revisit these computations and compute the values of the L2 relaxations for larger p’s. Our results provide additional numerical evidence that the L2 relaxations, and therefore also the SOS-4 relaxations, are asymptotically growing at an order smaller than the square root of p.

# Hamilton-Jacobi equations on graphs with applications to semi-supervised learning and data depth

## Jeff Calder, Mahmood Ettehad

Shortest path graph distances are widely used in data science and machine learning, since they can approximate the underlying geodesic distance on the data manifold. However, the shortest path distance is highly sensitive to the addition of corrupted edges in the graph, either through noise or an adversarial perturbation. In this talk we present a family of Hamilton-Jacobi equations on graphs that we call the p-eikonal equation. We show that the p-eikonal equation with p=1 is a provably robust distance-type function on a graph, and the limiting case in which p goes to infinity recovers shortest path distances. While the p-eikonal equation does not correspond to a shortest-path graph distance, we nonetheless show that the continuum limit of the p-eikonal equation on a random geometric graph recovers a geodesic density weighted distance in the continuum. We show the results of applications to data depth and semi-supervised learning.