AIDEme: An active learning based system for interactive exploration of large datasets

Abstract

There is an increasing gap between fast growth of data and limited human ability to comprehend data. Consequently, there has been a growing demand for analytics tools that can bridge this gap and help the user retrieve high-value content from data. We introduce AIDEme, a scalable interactive data exploration system for efficiently learning a user interest pattern over a large dataset. The system is cast in a principled active learning (AL) framework, which iteratively presents strategically selected records for user labeling, thereby building an increasingly-more-accurate model of the user interest. However, a challenge in building such a system is that existing active learning techniques experience slow convergence when learning the user interest on large datasets. To overcome the problem, AIDEme explores properties of the user labeling process and the class distribution of observed data to design new active learning algorithms, which come with provable results on model accuracy, convergence, and approximation, and have evaluation results showing much improved convergence over existing AL methods while maintaining interactive speed. In this demonstration, conference attendees will interact with AIDEme for a variety of exploration tasks on real-world datasets, enabling a better understanding of the evolution of the learned model with each labeled example, how the factorization of the user decision making process improves performance, and how the model evolves differently when various AL algorithms are used.

Type
Publication
In Neural Information Processing Systems (Demo)