Learning to Discover: Adaptive Data Selection for Classification and Estimation

March 14, 2008
Time: 11:00am-12:00pm
Interschool Lab, 750 CEPSR
Speaker: Rui M. Castro, University of Wisconsin, Madison

Abstract

Science is arguably the pinnacle of human intellectual achievement, yet the scientific discovery process itself remains an art. Human intuition and experience is still the driving force of the high-level discovery process: we determine which hypotheses and theories to entertain, which experiments to conduct, how data should be interpreted, when hypotheses should be abandoned, and so on. Meanwhile machines are limited to low level tasks such as gathering and processing data. A grand challenge for scientific discovery in the 21st century is to devise machines that directly participate in the high level discovery process. The work presented in this talk is a first step towards this goal. Common statistical inference and learning theories often assume that all data are collected prior to analysis. Alternatively, one can envision sequential, adaptive data collection procedures that use information gleaned from previous samples to guide the selection of future samples. This is extremely important for many pattern classification applications where the task of collecting/labeling data is often painstaking and costly, and therefore one would like only to collect the data that provides the most relevant information. We refer to such feedback-driven processes as active learning methods. In this talk I present a characterization of the achievable performance limits in active learning. Using minimax analysis techniques I describe the behavior of the classification error as the number of samples increases for broad classes of distributions, characterized by decision boundary regularity and noise conditions. The results clearly indicate situations under which one can achieve dramatic improvements, in terms of rates of error convergence, through active learning. I will also briefly discuss applications of active learning arising in sensing, networking and systems biology.


500 W. 120th St., Mudd 1310, New York, NY 10027    212-854-3105               
©2014 Columbia University