Content

Speaker:

Alireza Salemi

Abstract:

As Large Language Models (LLMs) become increasingly integrated into user-facing applications, the ability to adapt these systems to individual users is becoming essential. Users exhibit diverse preferences, writing styles, and information needs, which are not adequately captured by generic, one-size-fits-all models. This proposal addresses the problem of personalizing LLMs through three interconnected pillars: evaluation, methodology, and optimization. First, we develop a comprehensive framework for evaluating personalization in generative settings. Standard overlap-based metrics are insufficient for capturing user-specific preferences, particularly in open-ended generation tasks where multiple valid outputs may exist. To address this, we introduce the LaMP and LaMP-QA benchmarks, which cover a range of personalized tasks, including classification, content generation, and information-seeking scenarios. We further propose ExPerT, an explainable aspect-based evaluation framework that measures alignment between generated outputs and user references in terms of both content and style. In addition, we introduce rubric-based evaluation using question narratives, enabling fine-grained, personalized assessment without relying on a single reference output. Second, we conduct a systematic study of methods for personalizing LLMs, examining approaches across input-level, parameter-level, and output-level interventions. We focus on Retrieval-Augmented Generation (RAG) as a primary mechanism for incorporating user-specific context and compare it with parameter-efficient fine-tuning methods such as LoRA. Our empirical results show that RAG provides a more effective and scalable approach to personalization, while also highlighting important trade-offs related to data availability, computational cost, and privacy. To address privacy concerns, we further propose an output-level personalization framework, P3, which enables personalization without directly exposing user data to server-side models. Finally, we advance the optimization of retrieval-augmented personalized LLMs. We identify key limitations in existing RAG pipelines, including challenges in retrieving relevant context from large and noisy user profiles and effectively utilizing that context during generation. To address these issues, we propose methods for optimizing both retrieval and generation, including feedback-driven retriever optimization and self-training with natural language feedback. These techniques improve both the selection of personalized context and the model’s ability to reason over it, leading to substantial gains in end-to-end personalization performance. By jointly addressing evaluation, methodology, and optimization, this proposal provides a unified framework for developing more accurate, scalable, and privacy-aware personalized LLMs.

Advisor:

Hamed Zamani