Abstract: Why wait until a user finishes speaking to begin interpreting their intent? Inspired by work in Simultaneous Machine Translation, we introduce two practical tasks where eager interpretation and evaluation improve user experience.
The first work reduces latency in a conversational agent with tool usage (Zhou, et al., ACL ‘22). Many APIs are safe to invoke speculatively (e.g. checking the weather), and we explore and measure the cost/benefit trade-offs of prefetching likely queries so that a user can be shown results as soon as they finish speaking. Improvements can be especially notable when tool usage is chained, as it often is in the SMCalFlow dataset.
The second work introduces Interactive Dictation (Li, et al., ACL ‘23). Here, a user can seamlessly mix voice dictation and unrestricted natural-language commanding to produce a text document. The AI agent must continuously predict the boundaries between dictation and commanding, and the correct interpretation of text-editing commands, in order to maintain the intended document state.
We provide data, evaluation code, and baseline models for both new tasks.
Bio: Dr. Thomson’s research mainly focuses on predicting graph representations of natural language semantics. He helped develop early techniques for Abstract Meaning Representation (AMR) parsing, semantic dependency parsing (SDP), and recently, Constrained Language Model Parsing (CLaMP) for nl2code. He received his Ph.D. from the Language Technologies Institute at Carnegie Mellon University working with Dr. Noah Smith, and is currently working on Conversational AI at Microsoft.