Faculty Recruiting Support CICS

Two Disparate Applications of Independent Set Algorithms: The Hard Sphere Model and Preparing Biological Sequence Data for Benchmarking

07 Oct
Wednesday, 10/07/2020 12:00pm to 1:30pm
Virtual via Zoom
Rising Stars
Speaker: Samantha Petti

Title: Two Disparate Applications of Independent Set Algorithms: The Hard Sphere Model and Preparing Biological Sequence Data for Benchmarking

Abstract: The first part of this talk will focus on the hard sphere model, a well-known statistical physics model of monatomic gases. We describe a Markov chain for sampling from the model that achieves rapid mixing for a wider range of fugacity parameters than previously known and use this result to deduce properties of the infinite volume limit of the model. Our approach is inspired by Vigoda's analysis of the hard core model (a model of random independent sets) on bounded degree graphs. The second half of the talk will focus on another application of independent set algorithms: splitting biological sequence data into training and test sets for benchmarking. In typical machine learning applications, one assigns the data to training and test sets at random, trains a model using the training data, and evaluates the accuracy of the model on the test data. Biological sequence datasets frequently include many sequences that are very similar, and so a random division of the data would result in some test sequences that are very similar to training sequences. In order to meaningfully predict the accuracy of a model on data it has not yet seen, it is necessary for the training and test sequences to be sufficiently far apart. We develop an algorithm that splits sequence data into training and test sets such that no training-test sequence pair is too close. Finally, we evaluate the performance of our algorithm on the protein sequence families in the Pfam database.

Bio: Samantha Petti is a postdoc at the NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard. In May 2020, she earned her PhD in Algorithms, Combinatorics, and Optimization from Georgia Tech, where she was advised by Santosh Vempala. Previously, she graduated from Williams College where she majored in Mathematics. Her graduate work focused on random graphs, probability, and algorithms. Broadly, she is interested in applying her theoretical training to create computational tools to make sense of the recent explosion of biological sequence data.

The UMass Amherst CICS Rising Stars in Computer Science lecture series highlights the stellar work of young computer scientists about to launch into careers in academia. Join us to hear from rising stars working to solve pressing issues facing the field. To join this virtual meeting via Zoom, click here.

Attendees will need a passcode to enter this meeting. If you need this passcode, please see the advertised information for talks on the seminars mailing list or email Alex Taubman.