Massive Graph Analysis in the Data Stream Model

15 Dec
Friday, 12/15/2017 1:00pm to 3:00pm
Computer Science Building, Room 140
Ph.D. Dissertation Proposal Defense

"Computing Properties of Massive Graphs in the Data Stream Model"

Graphs have become the abstraction of choice in modeling highly-structured data. The need to compute graph-theoretic properties of datasets arises in all applications that involve entities and pairwise relations between them. However, in practice the datasets in question can be too large to be stored in main memory, distributed across many machines, or changing over time. Moreover, in the increasing number of applications the algorithm has to make real time decisions as the data arrives, which puts further limitations on the time and space that can realistically be used. These characteristics render classical algorithmic approaches obsolete and necessitate the development of new techniques. The streaming model of computation takes these challenges into account, providing a trade-off between the resources used by the algorithm and its accuracy. A graph stream is defined by a sequence of edge insertions (and sometimes deletions) into an initially empty graph, with the objective of computing a certain property of the graph at the end of the stream. In this model, we explore the following problems that serve as important primitives in massive graph analysis: computing matchings, vertex cover, hitting set, and counting the number of triangles in a graph. We also describe open problems related to the questions of finding dense subgraphs and testing k-connectivity of the graph.

Advisor: Andrew McGregor