Ingo Scholtes
Data Analytics Group
Department of Informatics (IfI)
University of Zurich
September 5 2018
In the last (open-ended) exploration, you get the chance to apply multi-order representation in the analysis of real data. In addition to the pathway data from session 1, we will consider data that we provide in the SQLite database temporal_networks.db
. You can check which tables it contains by checking the metadata
table:
import pathpy as pp
import sqlite3
con = sqlite3.connect('../data/temporal_networks.db',)
con.row_factory = sqlite3.Row
for row in con.execute('SELECT * from metadata'):
print('{0} \t\t {1}'.format(row['tag'], row['name']))
Details on the origin of these data can be found here. Below, we include boilerplate code to load these data sets into the TemporalNetwork
class in pathpy
:
table = 'manufacturing_email'
# Check whether network is directed or not
directed_network = bool(con.execute("SELECT directed FROM metadata WHERE tag='{0}'".format(table)).fetchone()['directed'])
t = pp.TemporalNetwork.from_sqlite(con.execute('SELECT source, target, time FROM ' + table),
directed=directed_network)
print(t)
Using these data and the methods introduced in our tutorial, we suggest to study the following problems (in ascending order of difficulty):
MultiOrderModel
class to learn the optimal order of a temporal network. How does the detected optimal order change with the time scale δ that you use in the extraction of causal paths?MultiOrderModel
class to learn the optimal order of the London Tube data set. How does the detected optimal order compare to the prediction performance studied in exploration 1.4?Again, these are only suggestions and you are welcome to use the time to study other data sets or questions that come to your mind. We'll be happy to help you with the analysis.