Data Exploration using Example-based Methods

Matteo Lissandrini, Davide Mottin, Themis Palpanas, Yannis Velegrakis

Synthesis Lectures on Data Management

Example-based approaches and book outline

Example-based approaches and book outline

TL;DR

A lecture-style book on example-based approaches

  • For exploratory tasks using examples instead of queries to retrieve data
  • Connecting the different works in the area
  • Providing novel insights to the use of machine learning for user understanding
  • Highlighting visionary research directions on the area of example-based tasks and learning

Abstract

Data usually comes in a plethora of formats and dimensions, rendering the exploration and information extraction processes challenging. Thus, being able to perform exploratory analyses in the data with the intent of having an immediate glimpse on some of the data properties is becoming crucial. Exploratory analyses should be simple enough to avoid complicate declarative languages (such as SQL) and mechanisms, and at the same time retain the flexibility and expressiveness of such languages. Recently, we have witnessed a rediscovery of the so called example-based methods, in which the user, or the analyst circumvents query languages by using examples as input. An example is a representative of the intended results, or in other words, an item from the result set. Example-based methods exploit inherent characteristics of the data to infer the results that the user has in mind, but may not able to (easily) express. They can be useful in cases where a user is looking for information in an unfamiliar dataset, when the task is particularly challenging like finding duplicate items, or simply when they are exploring the data. In this book, we present an excursus over the main methods for exploratory analysis, with a particular focus on example-based methods. We show how that different data types require different techniques, and present algorithms that are specifically designed for relational, textual, and graph data. The book presents also the challenges and the new frontiers of machine learning in online settings which recently attracted the attention of the database community. The lecture concludes with a vision for further research and applications in this area.