Data Stream Algorithms for Large Graphs and High Dimensional Data

06 Aug
Monday, 08/06/2018 1:00pm to 3:00pm
Computer Science Building, Room 151
Ph.D. Thesis Defense
Speaker: Hòa Vu

In contrast to the traditional random access memory computational model where the entire input is available in the working memory, the data stream  model only provides sequential access to the input. The data stream model is a natural framework to handle large and dynamic data. In this model, we focus on designing algorithms which  use sublinear memory and a small number of passes over the stream. Other desirable properties include fast update time, query time, and post processing time.

In this thesis, we consider different problems in graph theory, combinatorial optimization, and high dimensional data processing in the data stream model.

The first part of this thesis focuses on algorithms for  graph theory and combinatorial optimization. We present new results for the problems of  finding the densest subgraph, counting the number of triangles,  finding capacitated MaxCut, and  finding the maximum $k$ set coverage.

The second part of this thesis considers problems in high dimensional data streams. In this setting, each stream item consists of multiple coordinates corresponding to different attributes. We consider the problem of  testing or learning about the relationships among the attributes and the problem of finding heavy hitters in arbitrary subsets of attributes.

Advisor: Andrew McGregor