Network-based data mining techniques such as graph mining, (social) network analysis, link prediction and graph clustering form an important foundation for data science applications in computer science, computational social science, and the life sciences. They help to detect patterns in large data sets that capture dyadic relations between pairs of genes, species, humans, or documents and they have improved our understanding of complex networks.
While the potential of analysing graph or network representations of relational data is undisputed, we increasingly have access to data on networks that contain more than just dyadic relations. Consider, e.g., data on user click streams in the Web, time-stamped social networks, gene regulatory pathways, or time-stamped financial transactions. These are examples for time-resolved or sequential data that not only tell us who is related to whom but also when and in which order relations occur. Recent works have exposed that the timing and ordering of relations in such data can introduce higher-order, non-dyadic dependencies that are not captured by state-of-the-art graph representations. This oversimplification questions the validity of graph mining techniques in time series data and poses a threat for interdisciplinary applications of network analytics.
To address this challenge, researchers have developed advanced graph modelling and representation techniques based on higher- and variable-order Markov models, which enable us to model non-Markovian characteristics in time series data on networks. Introducing this exciting research field, the goal of this tutorial is to give an overview of cutting-edge higher-order data analytics techniques. Key takeaways for attendees will be (i) a solid understanding of higher-order network modelling and representation learning techniques, (ii) hands-on experience with state-of-the-art higher-order network analytics and visualisation packages, and (iii) a clear demonstration of the benefits of higher-order data analytics in real-world time series data on technical, social, and ecological systems.
A detailed summary of the topics, literature, and tools covered in this hands-on tutorial can be found in the tutorial paper.
When and where
The tutorial will take place on Wednesday August 22, 2018 from 08:30 - 17:10 in ICC Capital Suite Room 2+3+4 of the ExCel London, 1 Western Gateway, Royal Victoria Dock, London, E16 1FR.
Prerequisites
Participants should bring a laptop with a python 3.x environment. See setup instructions. Some basic prior exposure to python is beneficial. In the first session of the tutorial we will give a brief introduction to interactive data science with python, jupyter notebook, and VS Code.
Schedule
The tutorial consists of three separate blocks, in which we give an overview of three different software frameworks for higher-order network analysis.
Block I: Higher-Order Network Analytics with pathpy
Session: Introduction to Higher-Order Network Analytics
08:30 - 09:30
Tutor: Ingo Scholtes, Data Analytics Group, University of Zurich
Welcome Note and Tutorial Overview
Intro: Higher-Order Network Analytics for Time Series Data (30 min) | slides
- Causal paths in temporal network data
- Ordering matters in time series data
- Higher-order generative models for causal paths
- Representation learning in temporal network data
Live Coding (30 min)
Unit | Topic | Tasks | Solution |
---|---|---|---|
1.1 | A Primer to Data Science with python , jupyter , git and VS Code (10 min) |
.py | N/A |
1.2 | Analysis and Visualisation of Path Data in pathpy (20 min) |
.py, .ipynb | .py .ipynb .html |
KDD Coffee break
09:30 - 10:00
Session: Multi-order Representation Learning
10:00 - 12:00
Tutor: Ingo Scholtes, Data Analytics Group, University of Zurich
Live Coding (120 min)
Unit | Topic | Tasks | Solution |
---|---|---|---|
1.3 | Fitting and Visualising Higher-order Network Models (20 min) | .py, .ipynb | .py .ipynb .html |
1.4 | Time-stamped Network Analysis in pathpy (20 min) |
.py, .ipynb | .py .ipynb .html |
1.5 | Exploration: Higher-order Analysis of real-world data (20 min) | .py, .ipynb | N/A |
1.6 | Multi-order Representation Learning (20 min) | .py, .ipynb | .py .ipynb .html |
1.7 | Optimal Higher-order Analytics for Temporal Data (20 min) | .py, .ipynb | .py .ipynb .html |
1.8 | Exploration: Multi-order Analysis of Time-stamped Social Networks (20 min) | .py, .ipynb | N/A |
KDD Lunch break
12:00 - 13:30
Block II: Introduction to Higher-Order Graph Clustering with Infomap
Session: Introduction to Flow Compression
13:30 - 15:30
Tutor: Daniel Edler, Umeå University
Intro: Flow Compression with the MapEquation (45 minutes) | Slides
- Coding theory: The minimum description length principle
- Compression of modular network flows: The MapEquation
- Multilevel partitions
Live Coding (75 min)
Unit | Topic | Notebook | Live Solution |
---|---|---|---|
2.1 | Introduction to Infomap |
.py, .ipynb | .py |
2.2 | Explore flight path data with Infomap and interactive visualisations |
.py, .ipynb | .py |
KDD Coffee break
15:30 - 16:00
Session: Higher-order Graph Clustering and Visualisation
16:00 - 17:00
Tutor: Daniel Edler, Umeå University
Intro: Higher-order flows (15 minutes) | Slides
- From pathways to networks with and without memory
- Sparse Markov model
Live Coding (60 min)
Unit | Topic | Notebook | Live Solution |
---|---|---|---|
2.3 | Introduction to sparse higher-order networks |
.py, .ipynb | .py |
2.4 | Sparse networks for flight data | .py, .ipynb | .py |
Tutorial Closing
17:00 - 17:10
Block III: Variable-order Analytics with BuildHON/HONVis
Tutor: Nitesh Chawla, University of Notre Dame
Due to unforeseen circumstances, the tutor could not attend KDD’18. This block will thus be a 1 hour virtual self-study session. Participants can find the tutorial material in the gitHub repository.
Intro: Representing variable orders in networks (30 min) | slides website
- Introduction to higher-order network
- Why variable orders?
- BuildHON algorithm in a nutshell
- Real-world applications
Live Demo (30 min) | slides video
- Synthesising trajectories with known variable orders of dependencies
- Use BuildHON+ (parameter-free) to extract variable orders of dependencies and build HON
- Use HONVis to visualize and interactively explore the higher-order network of NYC taxi data
Data sets
A description of data sets that will be provided to participants, and which will be analysed in the tutorial is available here.
Setting up the environment
Hands-on sessions will be completed in python
. A detailed description on how to set up the environment can be found in the setup instructions.