Content

Speaker:

Max Hamilton

Abstract:

Collecting ground truth data is a crucial step in AI model development, providing the foundation from which models learn. However, in many domains, high-quality ground truth data is scarce or expensive to acquire. This bottleneck is especially prevalent in scientific domains, where data collection relies on a limited pool of domain experts. Because these applications demand precise statistical estimates and confidence intervals, there is a critical need for methods that maximize data efficiency while drastically reducing human labeling costs.

To address this challenge, we propose a series of methodologies designed to overcome data scarcity. First, we incorporate existing "cheap" data sources to fill in missing gaps. We developed a framework that uses Wikipedia text descriptions to help predict where different species live. By teaching the model to link text to locations, we can estimate a species' range even when we have few physical observations of it.

Next, we introduce Active Measurement, a framework for estimating population metrics without labeling every data point. Instead of trusting a biased AI model or labeling everything by hand, we use the model's predictions to pick the most important samples for a human expert to label. This allows us to calculate unbiased estimates and confidence intervals for things like bird populations while minimizing annotation cost.

We subsequently extend these active measurement principles to a broader range of problems. Our approach enables faster analysis of star clusters through the efficient calculation of the two-point correlation function. Furthermore, it addresses the broader challenge of multi-target estimation, allowing us to assess the relative abundance of coral species across vast reef systems.

Finally, we aim to apply these efficiency principles to the broader problem of model evaluation. We plan to explore how to best evaluate AI agents while minimizing human feedback. To achieve this we will develop an embedding model to form clusters of similar samples, allowing us to reduce overlap and get high quality evaluations with fewer human queries.

Advisor:

Subhransu Maji