Faculty Recruiting Support CICS

Foundations of Node Representation Learning

17 Nov
Thursday, 11/17/2022 9:00am to 11:00am
CS 201
PhD Dissertation Proposal Defense

Low-dimensional node representations, also called node embeddings, are a cornerstone in the modeling and analysis of complex networks. In recent years, advances in deep learning have spurred development of novel neural network-inspired methods for learning node representations which have largely surpassed classical 'spectral' embeddings in performance. Yet little work asks the central questions of this thesis: Why do these novel deep methods outperform their classical predecessors, and what are their limitations?

We pursue several paths to answering these questions. To further our understanding of deep embedding methods, we explore their relationship with spectral methods, which are better understood, and show that some popular deep methods are equivalent to spectral methods in a certain natural limit. We also introduce the problem of inverting node embeddings in order to probe what information they contain. Further, we propose a simple, non-deep method for node representation learning, and find it to often be competitive with modern deep graph networks in downstream performance.


To better understand the limitations of node embeddings, we prove some upper and lower bounds on their capabilities. Most notably, we prove that node embeddings are capable of exact low-dimensional representation of graphs with bounded max degree, and we further show that a simple algorithm can find such exact embeddings for real-world networks. By contrast, we also prove inherent bounds on random graph models, including those derived from node embeddings, to capture key structural properties of networks without simply memorizing a given graph.

We propose two extensions of these results that broaden their applicability. First, we propose to extend the exact representation result to networks with bounded arboricity rather than bounded max degree, which encompass a much broader range of real-world networks. We also propose to extend the upper bound results for random graph models, for which we made the strong assumption of edge independence, to more general classes of models.

Advisor: Cameron Musco