2024 Canadian Workshop of Information Theory

Tutorials

Tutorial 1
Nicholas Harvey, The University of British Columbia, Canada
Provable Dimensionality Reduction and its Applications

Abstract: Dimensionality reduction is an important concept in high-dimensional data analysis. This is the general idea that there is a low-dimensional representation that captures some salient properties of a high-dimensional data set.

In many applied contexts, dimensionality reduction is only a heuristic notion, but nevertheless a notion that can be useful for certain data sets. However, for data in Euclidean space, there is a provable notion of dimensionality reduction, which is the topic of this tutorial. The core idea, originally due to Johnson and Lindenstrauss, is that any high-dimensional data set of n points has a representation in only O(log n) dimensions that approximately preserves the pairwise distances of the high-dimensional data set. Moreover, that low-dimensional representation can be computed by very efficient algorithms.

Johnson-Lindenstrauss dimensionality reduction has become an indispensable tool in algorithms design. It has a vast number of applications in efficient algorithms for processing large data sets, for example in near-neighbour data structures, in streaming algorithms, and in randomized numerical linear algebra.

The first half of the tutorial will give a formal definition of the Johnson-Lindenstrauss dimensionality reduction, and given a sketch proof of why it works. We will then discuss a highly efficient algorithm, known as the Fast Johnson-Lindenstrauss Transform, for constructing the low-dimensional representation. The second half of the tutorial will turn towards the applications. We will explain the core ideas of the near-neighbour data structure, and we will explain how dimensionality reduction leads to a fast randomized algorithm for least squares regression.

Tutorial 2
Jun Chen and Ashish Khisti, McMaster University and University of Toronto, Canada
Lossy Data Compression using Deep Learning

Abstract: Deep generative models when applied to (lossy) image compression tasks can reconstruct realistic looking outputs even at very low bit-rates, when traditional compression methods suffer from significant artifacts. This has led to a significant interest in both the information theoretic aspects, as well as practical architectures of deep learning based image compression.

In this tutorial we will provide a survey of existing deep learning based compression systems that have appeared in the literature over the past few years. We will introduce the concept of perception loss that appears naturally when training deep learning based models. We will then introduce the rate-distortion-perception (RDP) function that underlies the performance of learned image compression systems and explain its connection to the classical rate-distortion function in information theory. We will discuss several properties of the RDP function and study the quadratic Gaussian source model in detail. Motivated by the Gaussian case, we will introduce a notion of universal encoded representations, where the same compressed representation can be used to simultaneously achieve different operating points on the distortion-perception trade-off. We will demonstrate through both theoretical analysis and experimental results involving deep learning models that near-optimal encoding representations can be constructed for a broad class of source models.

Building upon the insights from RDP function, we will consider a closely related setup — cross-domain lossy compression — that applies to a broader class of applications involving image restoration tasks (e.g., de-nosing, super-resolution etc.) over compressed data representations. We will establish coding theorems that rely on recent one-shot methods in information theory, establish connections to the problem of optimal transport, and develop architectural insights that are validated using deep learning models.

Different from conventional lossy source coding for which it suffices to consider deterministic algorithms, cross-domain lossy compression generally has to resort to stochastic mappings and can also benefit from the availability of common randomness. We will elucidate the role of randomness and how it depends on the formulation of perception constraint.

Finally, we will consider an information-theoretic objective, termed information constrained optimal transport, that arises from cross-domain lossy compression. Its functional properties, asymptotics and bounds will be discussed. The quadratic vector Gaussian case will be leveraged as an illustrative example to show that this line of research has the potential to generate many theoretical results with practical implications.

Short tutorial
Mark Schmidt, The University of British Columbia, Canada
Understanding and improving the numerical optimization underlying modern machine learning

Abstract: This talk will discuss stochastic gradient descent (SGD) and Adam, the main algorithms underlying modern machine learning. We will first discuss the effect of over-parameterization in modern models and its effects on SGD. We will discuss how over-parameterization allows SGD to converge with a constant step size, allows the development of faster SGD methods, and allows us to update the learning rate during the training procedure. We will then discuss the nebulous Adam algorithm, arguing that standard explanations for its success at training language models appear to be limited and offering a new explanation for why Adam is so effective in some applications.