2024 Canadian Workshop of Information Theory
Tutorials

Tutorial 1
Nicholas Harvey, The University of British Columbia, Canada
Provable Dimensionality Reduction and its Applications
Abstract: Dimensionality reduction is an important concept in highdimensional data analysis. This is the general idea that there is a lowdimensional representation that captures some salient properties of a highdimensional data set.
In many applied contexts, dimensionality reduction is only a heuristic notion, but nevertheless a notion that can be useful for certain data sets. However, for data in Euclidean space, there is a provable notion of dimensionality reduction, which is the topic of this tutorial. The core idea, originally due to Johnson and Lindenstrauss, is that any highdimensional data set of n points has a representation in only O(log n) dimensions that approximately preserves the pairwise distances of the highdimensional data set. Moreover, that lowdimensional representation can be computed by very efficient algorithms.
JohnsonLindenstrauss dimensionality reduction has become an indispensable tool in algorithms design. It has a vast number of applications in efficient algorithms for processing large data sets, for example in nearneighbour data structures, in streaming algorithms, and in randomized numerical linear algebra.
The first half of the tutorial will give a formal definition of the JohnsonLindenstrauss dimensionality reduction, and given a sketch proof of why it works. We will then discuss a highly efficient algorithm, known as the Fast JohnsonLindenstrauss Transform, for constructing the lowdimensional representation. The second half of the tutorial will turn towards the applications. We will explain the core ideas of the nearneighbour data structure, and we will explain how dimensionality reduction leads to a fast randomized algorithm for least squares regression.


Tutorial 2
Jun Chen and Ashish Khisti, McMaster University and University of Toronto, Canada
Lossy Data Compression using Deep Learning
Abstract: Deep generative models when applied to (lossy) image compression tasks can reconstruct realistic looking outputs even at very low bitrates, when traditional compression methods suffer from significant artifacts. This has led to a significant interest in both the information theoretic aspects, as well as practical architectures of deep learning based image compression.
In this tutorial we will provide a survey of existing deep learning based compression systems that have appeared in the literature over the past few years. We will introduce the concept of perception loss that appears naturally when training deep learning based models. We will then introduce the ratedistortionperception (RDP) function that underlies the performance of learned image compression systems and explain its connection to the classical ratedistortion function in information theory. We will discuss several properties of the RDP function and study the quadratic Gaussian source model in detail. Motivated by the Gaussian case, we will introduce a notion of universal encoded representations, where the same compressed representation can be used to simultaneously achieve different operating points on the distortionperception tradeoff. We will demonstrate through both theoretical analysis and experimental results involving deep learning models that nearoptimal encoding representations can be constructed for a broad class of source models.
Building upon the insights from RDP function, we will consider a closely related setup — crossdomain lossy compression — that applies to a broader class of applications involving image restoration tasks (e.g., denosing, superresolution etc.) over compressed data representations. We will establish coding theorems that rely on recent oneshot methods in information theory, establish connections to the problem of optimal transport, and develop architectural insights that are validated using deep learning models.
Different from conventional lossy source coding for which it suffices to consider deterministic algorithms, crossdomain lossy compression generally has to resort to stochastic mappings and can also benefit from the availability of common randomness. We will elucidate the role of randomness and how it depends on the formulation of perception constraint.
Finally, we will consider an informationtheoretic objective, termed information constrained optimal transport, that arises from crossdomain lossy compression. Its functional properties, asymptotics and bounds will be discussed. The quadratic vector Gaussian case will be leveraged as an illustrative example to show that this line of research has the potential to generate many theoretical results with practical implications.


Short tutorial
Mark Schmidt, The University of British Columbia, Canada
Understanding and improving the numerical optimization underlying modern machine learning
Abstract: This talk will discuss stochastic gradient descent (SGD) and Adam, the main algorithms underlying modern machine learning. We will first discuss the effect of overparameterization in modern models and its effects on SGD. We will discuss how overparameterization allows SGD to converge with a constant step size, allows the development of faster SGD methods, and allows us to update the learning rate during the training procedure. We will then discuss the nebulous Adam algorithm, arguing that standard explanations for its success at training language models appear to be limited and offering a new explanation for why Adam is so effective in some applications.

