Ingo Scholtes
Data Analytics Group
Department of Informatics (IfI)
University of Zurich

September 5 2018

In the last (open-ended) exploration, you get the chance to apply multi-order representation in the analysis of real data. In addition to the pathway data from session 1, we will consider data that we provide in the SQLite database temporal_networks.db. You can check which tables it contains by checking the metadata table:

import pathpy as pp
import sqlite3

con = sqlite3.connect('../data/temporal_networks.db',)
con.row_factory = sqlite3.Row

for row in con.execute('SELECT * from metadata'):
    print('{0} \t\t {1}'.format(row['tag'], row['name']))

sociopatterns_hospital 		 Social contacts in a hospital
manufacturing_email 		 E-Mail exchanges in Polish manufacturing company
lotr 		 Character co-occurrences in The Lord of the Rings
haggle 		 Contacts between humans recorded by smart devices
sociopatterns_primaryschool 		 Primary School contact networks

Details on the origin of these data can be found here. Below, we include boilerplate code to load these data sets into the TemporalNetwork class in pathpy:

table = 'manufacturing_email'

# Check whether network is directed or not
directed_network = bool(con.execute("SELECT directed FROM metadata WHERE tag='{0}'".format(table)).fetchone()['directed'])
t = pp.TemporalNetwork.from_sqlite(con.execute('SELECT source, target, time FROM ' + table), 
                                   directed=directed_network)
print(t)

2018-09-04 23:02:58 [Severity.INFO]	Retrieving directed time-stamped links ...
2018-09-04 23:02:59 [Severity.INFO]	Building index data structures ...
2018-09-04 23:02:59 [Severity.INFO]	Sorting time stamps ...
2018-09-04 23:02:59 [Severity.INFO]	finished.
Nodes:			167
Time-stamped links:	82927
Links/Nodes:		496.5688622754491
Observation period:	[1262450410, 1285877292]
Observation length:	 23426882 
Time stamps:		 57842 
Avg. inter-event dt:	 405.02207776490724
Min/Max inter-event dt:	 1/225913

Using these data and the methods introduced in our tutorial, we suggest to study the following problems (in ascending order of difficulty):

Generate higher-order visualisations of the US Flight and London Tube data and visually compare the graph layouts calculated for the first and optimal-order models.
Use the MultiOrderModel class to learn the optimal order of a temporal network. How does the detected optimal order change with the time scale $\delta$ that you use in the extraction of causal paths?
Use the MultiOrderModel class to learn the optimal order of the London Tube data set. How does the detected optimal order compare to the prediction performance studied in exploration 1.4?
Study the change in the algebraic connectivity between the second-order model and the second-order null model for (i) a temporal network data set and (ii) the US Flights data.
Perform a spectral clustering of a dynamic social network based on the Laplacian of higher-order networks at different orders. How does the clustering differ from a first-order clustering?

Again, these are only suggestions and you are welcome to use the time to study other data sets or questions that come to your mind. We'll be happy to help you with the analysis.

8 Exploration: Multi-order analysis of paths and time-stamped social networks¶

8 Exploration: Multi-order analysis of paths and time-stamped social networks ¶