Faculty Recruiting Make a Gift

Transfer Learning with Mixtures of Manifolds

30 Mar
Friday, 03/30/2018 10:00am to 12:00pm
Lederle Graduate Research Center, Room A310
Ph.D. Thesis Defense
Speaker: Tommy Boucher

"Transfer Learning with Mixtures of Manifolds"

Advances in scientific instrumentation technology have increased the speed of data acquisition and the precision of sampling, creating an abundance of high-dimensional data sets. The ability to combine these disparate data sets and to transfer information between them is critical to accurate scientific analysis. Many modern-day instruments can record samples at many thousands of channels, far greater than the actual degrees of freedom in the sample data. This makes manifold learning, a class of methods that exploit the observation that high-dimensional data tend to lie on lower-dimensional manifolds, especially well-suited to this transfer learning task.

Existing manifold-based transfer learning methods can align related data sets in differing feature representations, but their inherent single manifold assumption causes them to fail in the presence of complex mixtures of manifolds. In this dissertation, a new class of transfer learning algorithms is developed for high-dimensional data sets that intrinsically lie on multiple low-dimensional manifolds. With a more realistic mixture of manifolds assumption, this class of algorithms allows for accurate and efficient transfer of information between data sets by aligning their complex underlying geometries.

In this dissertation, algorithms are presented that leverage corresponding samples between data sets and available label information, continuous or categorical. The two primary tasks dealt with are aligning mixtures of manifolds and heterogeneous domain adaptation of multi-manifold data sets. Linear, non-linear, and robust versions of the algorithm are described, as well as a method for actively selecting cross-data set correspondences. To show the practical effectiveness of these algorithms, they are compared across a number of synthetic and real-world domains, but most notably to align data recorded by spectroscopic instruments during space exploration, a new domain for transfer learning.

Advisor: Sridhar Mahadevan