The increasing interest in social networks, knowledge graphs, protein-interaction, and many other types of networks has raised the question how users can explore such large and complex graph structures easily. Current tools focus on graph management, graph mining, or graph visualization but lack user-driven methods for graph exploration.In many cases graph methods try to scale to the size and complexity of a real network. However, methods miss user requirements such as exploratory graph query processing, intuitive graph explanation, and interactivity in graph exploration. While there is consensus in database and data mining communities on the definition of data exploration practices for relational and semi-structured data, graph exploration practices are still indeterminate.
In this tutorial, we will discuss a set of techniques, which have been developed in the last few years for independent purposes, within a unified graph exploration taxonomy. The tutorial will provide a generalized definition of graph exploration in which the user interacts directly with the system either providing feedback or a partial query. We will discuss common, diverse, and missing properties of graph exploration techniques based on this definition, our taxonomy, and multiple applications for graph exploration. Concluding this discussion we will highlight interesting and relevant challenges for data scientists in graph exploration.
The continuously increasing interest in graphs and the growing amount of graph data available on the web require a careful design of data analysis techniques.However, from the user perspective most of the existing techniques appear as a black box that returns results without any explanation. For these reasons our community has resorted to data exploration techniques. In particular, while a huge effort has been devoted to text, relational, and semi-structured data, data exploration on graphs (graph exploration in short) is still in its infancy. Although many techniques for graphs have been studied in different domains, there is still lack of a unified graph exploration taxonomy. We abstracted user-driven graph exploration properties from techniques proposed in the literature and defined such a unified taxonomy. Our taxonomy consists of three strategies that form the backbone of our presentation along with relevant literature identified so far:
- Exploratory Graph Analysis entails the process of casting an incomplete or imperfect pattern query to let the system find the closest match. Such exploratory analysis may return a huge number of results, e.g., structures matching the pattern. Thus, the system is required to provide intelligent support. One such strategy is the well known query-by-example paradigm, in which the user provides the template for the tuples and let the system infer the others.
- Refinement of Graph Query Results is needed to deal with the overwhelming amount of results that is typical in subgraph processing. It includes approaches designed to present comprehensive result sets to the user or intermediate results that can be refined further. Instantiations of this kind are graph summaries, top-k methods, query reformulation, and skyline queries.
- Focused Graph Mining guides the users to a specific portion of the graph they are interested in. It requires the user to provide feedback in the process to restrict the computation to some portion of the graph. Ego-networks mining belongs to this strategy, since the user search is limited to a particular area of the graph and the algorithms focus on that specific area.
Outline and material
- Introduction and data exploration taxonomy
- User-driven Graph Exploration
- [First part] Exploratory Graph Analysis
- [Second part] Refinement of Graph Query Results
- [Third part] Focused Graph Mining
- Real-world use case and open challenges
Complete slides are available here.
Davide Mottin is a postdoctoral researcher at Hasso Plattner Institute. His research interests include graph mining, novel query paradigms, and interactive methods. He presented graph exploration methods in KDD 2015 and VLDB 2014 and is actively engaged in teaching database, big data analytics, and graph mining for Bachelor and Master courses. He received his PhD in 2015 from the University of Trento.
Anja Jentzsch is a PhD student in the Information Systems Group at Hasso Plattner Institute Potsdam. She is a Linked Data enthusiast, being involved in several Linked Data projects like Wikidata and DBpedia since 2007. She presented a tutorial in graph mining, web of data, and data integration in WWW conference. Currently, she is working on exploring frequent and common graph structures in heterogeneous graph databases.
Emmanuel Müller is professor and head of the Knowledge Discovery and Data Mining group at Hasso Plattner Institute. His research interests include graph mining, stream mining, clustering and outlier mining on graphs, streams, and traditional databases. He presented tutorials in database, data mining, and machine learning conferences such as SDM, ICDM, and ICML. He received his PhD in 2010 from RWTH Aachen University, had been independent group leader at Karlsruhe Institute of Technology (2010 - 2015) and postdoctoral fellow at University of Antwerp (2012 - 2015).