PhD Dissertation Proposal: Edmond Cunningham, Orthogonal Coordinates for Representing Probability Density Functions
Content
Speaker:
Abstract:
The manifold hypothesis is a guiding principle in machine learning that says real-world data often concentrates on, or near, a low-dimensional submanifold of Euclidean space. This principle has motivated methods in generative modeling and dimensionality reduction, such as latent generative models and principal component analysis, by suggesting that data should admit representations with fewer parameters than the ambient dimension. However, common generative approaches do not capture the spatially varying intrinsic dimension of data nor do they explicitly disentangle the factors of variation. This thesis reframes low dimensional representation as a coordinate-learning problem: we seek orthogonal coordinate systems that both fit the data distribution and align with the local factors of variation. We begin by developing preliminary ideas to represent data using as few dimensions as possible using tall matrix vector products. These approaches expose the need for orthogonality. In the second chapter, we devise an algorithm for learning orthogonal coordinates that fit the data distribution through a generative model called Principal Component Flows (PCF). Although PCFs offer the orthogonal coordinate system we seek, basic theoretical and scalability challenges remain. In the third chapter, we delve into the mathematics of orthogonal coordinates which gives us a theoretical foundation to develop a practical algorithm for learning orthogonal coordinates. In particular, we develop an algorithm for learning local orthogonal coordinate charts for the distribution of data that we apply to the problem of MCMC sampling from distributions with challenging geometry. Our work provides a principled set of tools that offers a new avenue for developing geometry-aware algorithms.
Advisor:
Ina Fiterau