UMass NLP Seminar: Efficient Pretraining and Finetuning of Self-Supervised Speech and Language Models with African Languages

21 Feb

Wednesday, 02/21/2024 11:30am to 12:45pm

Seminar

Speaker: Tolúlọpẹ́ Ògúnrẹ̀mí (Stanford University)

Abstract: With the prominence of large, multilingual pretrained language models, low-resource languages are rarely modelled monolingually, becoming victims of the “curse of multilinguality”. For natural language processing models, we propose that pretraining models on smaller amounts of data but from related languages could match the performance of models trained on large, unrelated data. We test our hypothesis on the Niger-Congo family and its Bantu and VoltaNiger sub-families, pretraining models with data solely from Niger-Congo languages and finetuning on downstream tasks.
Low-resource datasets of speech are too small to train bespoke acoustic models from scratch, but self-supervised multilingual representations can be easily finetuned for various speech processing tasks. In the extremely low-resource setting of South African codeswitched speech, we explore various finetuning methods to improve speech recognition performance.

Bio: Tolúlọpẹ́ Ògúnrẹ̀mí is a PhD student at Stanford University in the Stanford NLP Group. Her work focuses on speech and language processing for low-resource languages, currently African languages. Before, she did a Masters in Speech and Language Processing at the University Edinburgh.

Join the Seminar

Sp

:

UMass NLP Seminar

UMass NLP Seminar: Efficient Pretraining and Finetuning of Self-Supervised Speech and Language Models with African Languages

Subscribe to the CICS eNewsletter