Machine Learning and Friends Lunch: Deep Sequence Models: Context Representations, Regularization, and Applications to Language

15 Feb
Thursday, 02/15/2018 12:00pm to 1:00pm
Computer Science Building, Room 150/151
Machine Learning and Friends Lunch

    Recurrent Neural Networks (RNNs) are the most successful models
    for sequential data. They have achieved state-of-the-art results
    in many tasks including language modeling, image and text
    generation, speech recognition, and machine translation. Despite
    all these successes, RNNs still face some challenges: they fail
    to capture long-term dependencies (don't believe the myth that
    they do!) and they easily overfit.

    The ability to capture long-term dependencies in sequential data
    depends on the way context is represented. Theoretically, RNNs
    capture all the dependencies in the sequence via the use of
    recurrence and parameter sharing. However practically, RNNs face
    optimization issues. Assumptions made to counter these
    optimization challenges hinder the capability of RNNs to capture
    long-term dependencies. On the other hand, the overfitting
    problem of RNNs stem from the strong dependence of the hidden
    units to each other.

    I will talk about my research on context representation and
    regularization for RNNs. First, I will make the case that in the
    context of language, topic models are very effective at
    representing context and can be used jointly with RNNs to
    facilitate learning and capture long-term dependencies. Second, I
    will discuss our new proposed method to regularize RNNs called
    NOISIN. NOISIN relies on the concept of unbiased noise injection
    in the hidden units of RNNs to reduce co-adaptation. It
    significantly improves the generalization capabilities of
    existing RNN-based models including RNNs with dropout.

    Adji Bousso Dieng is a PhD student at Columbia University where
    she works with David Blei and John Paisley. Her work at Columbia
    is about combining probabilistic graphical modeling and deep
    learning to design better sequence models. She develops these
    models within the framework of variational inference which
    enables efficient and scalable learning. Her hope is that her
    research can be applied to many real world applications
    particularly to natural language understanding.

    Prior to joining Columbia, she worked as a Junior Professional
    Associate at the World Bank. She did her undergraduate training
    in France where she attended Lycee Henri IV and Telecom
    ParisTech---France's Grandes Ecoles system. She holds a Diplome
    d'Ingenieur from Telecom ParisTech and spent the third year of
    Telecom ParisTech's curriculum at Cornell University where she
    earned a Master in Statistics.Abstract: TBA