Using git to obtain tutorial material
While you can manually download all necessary
data files from our gitHub repository, we strongly recommend to clone this repository with
git to obtain a local, sychronised copy of all material. Assuming you have a working
git installation, you can do this by executing the following command in the terminal:
git clone --depth 1 https://github.com/IngoScholtes/kdd2018-tutorial
--depth 1 ensures that you only get the latest version, ignoring the history of the repository. If you don’t have
git installed already, here you can find information on how to set up git.
Prior to the start of the first tutorial session, we will add skeleton
python files to the
code directory of the repository. These files contain explanations, as well as empty jupyter cells that we will fill together throughout the live coding sessions. The tutors will regularly push the current solution to the repository. You can thus execute the terminal command
in the directory of your local copy to receive a sample solution that is growing as the hands-on tutorial moves forward. Just have a look in the directory
live_solutions. If you are using Visual Studio Code (see below) this is even easier: Just click the sync symbol in the status bar to update the current sample solution shown on the tutor’s screen! This will allow you to quickly correct any potential errors.
Installing python 3.X
To complete the hands-on exercises, you will need a working
python 3.x environment running on an operating system of your choice. For Windows, MacOS, and Linux users we recommend Anaconda 5.2 distribution, an OpenSource
python 3.6 distribution that comes pre-configured for data science and machine learning tasks.
The only additional package that you may need for this tutorial is the package markdown. We use it to produce nicely formatted output with the python skeleton files. You can just install it by typing:
pip install markdown
Installing Visual Studio Code
To complete the exercises, we recommend using the development environment Visual Studio Code, a platform-independent Open Source code editor available for Windows, MacOS, and Linux. Just download the installation file and run the setup. Once the installation has completed, run Visual Studio Code either by clicking the icon or by typing
code in the terminal.
To conveniently work with
jupyter notebooks in Visual Studio Code, we recommend two extensions, which you can install free of charge directly from Visual Studio Code’s integrated extension manager. We will need the official Python extension, which adds
python code editing, debugging, and linting functionality. We further need the Jupyter extension, which provides a convenient interface to the
jupyter notebook server automatically installed by
To install these two extensions, click the “module” icon in the bottom of the left menu bar or press
Ctrl+Shift+X. This will bring up the Extensions window. Type
python and click the top-most search result Python 2018.7.1. In the window on the right, click install. Repeat this procedure with the jupyter extension, i.e. search for
jupyter, click the top-most result Jupyter 1.1.4 by user Don Jayamanne and install the extension. A restart of Visual Studio Code completes the installation.
Once the installation is finished, open Visual Studio Code, click
File -> Open Folder and navigate to your local copy of the cloned github repository. In the Explorer panel (the files symbol in the left bar) you can then find the notebook files that you need to complete the tutorial.
Conveniently, Visual Studio Code comes with integrated support for
git. This means you can fetch the current, growing sample solution simply by navigating to the Source Control panel (the fork symbol in the left bar). In the … menu extension you just have to click Pull.
Installing higher-order data analytics packages
To apply higher-order data analytics to real data, we need to set up additional
Sessions 1 and 2 in block I will introduce
pathpy, an OpenSource
python package providing higher-order network analysis and visualisation tools for sequence, path, and temporal network data.
Sessions 3 and 4 in block II will introduce InfoMap, an OpenSource package that can be used to reveal overlapping modular patterns in higher-order network flows through complex systems.
The virtual session 5 in block III will introduce HON, an OpenSource
python package extracting variable orders of dependencies from sequential data, and building a higher-order network from the dependency rules, as well as HONVis, an interactive software package that facilitates the exploration of higher-order networks.
In the following, we explain how you can set up these packages:
Setting up pathpy
pathpy is pure python code and is available under an OpenSource license. It has no platform-specific dependencies and thus work on all platforms. It depends on
scipy which come preinstalled in the Anaconda 5.2 environment. Assuming that you have
python 3.x environment, the latest version of
pathpy can be installed via the python package index pypi. Just open a terminal window and run the command:
pip install pathpy2
pathpy has been name-squatted after we had released
pathpy to the
pypi test server. While we are working with the
pypi administrators to resolve this issue, we have to use
pathpy2 instead. So make sure that you install the
pathpy2 rather than the empty (spam) package
Setting up InfoMap
Infomap is written in C++ but includes a python interface. The easiest way to install it is via PyPI:
pip install infomap
If you are running on Windows 10, the simplest way to get the compilation tools needed to complete the installation is to install the Ubuntu bash. This will set up a Linux subsystem on your Windows machine. After you have installed the distribution, start the Ubuntu bash terminal and run the command above.
For more detailed information about the installation and usage of
InfoMap see www.mapequation.org/code.html.
Verifying your environment
Now that we have installed all necessary tools and packages, let us verify that our environment is set up properly.
For this, you can either create a new file in Visual Studio Code and copy the following code:
#%% import pathpy as pp paths = pp.Paths() paths.add_path('a,b,c') #%% print(paths)
Or you can open your local copy of the tutorial repository in Visual Studio Code as described above. In the
code directory, you will find the file
0_test_environment.py, which contains the lines above.
python extension of Visual Studio Code has been installed properly, you should see the
python code properly highlighted and colored. If the
jupyter extension has been set up properly, two code lenses
Run cell will appear above the
#%% tags. These tags mark the start of a cell in a
jupyter notebook that we can execute directly fron within Visual Studio Code.
Click the top-most
Run cell code lens. A menu will appear, asking you whether to start a new notebook or whether to select an existing
jupyter notebook server. Select
Start a new Notebook and wait for the status line
Python 3 Kernel (idle) to appear in Visual Studio Code’s status bar. Now click the second
Run cell code lens. A new window should pop up that shows the output of your code, in our example a list frequencies of paths of different lengths.
If you see this output, all is set up properly, and you are all set to complete the first two sessions of the hands-on tutorial.
Setting up BuildHON+ and HONVis
In order to complete the virtual tutorial session in block III, you will need to install the software
BuildHON+ is pure python code with no platform-specific dependencies. It works with python 3.X. To use it, change to the
code directory of the tutorial repository and run
in a terminal.
HONVis is built with C++ and we provide it in binary form. It is compiled separately for Windows, Mac and Linux. You can find it in the directory
For Windows, double click
HONVis_NYC_Taxi_Case_Jul_Aug.exe and follow the video tutorial available on www.higherordernetwork.com/visualization/ (5 mins).
For Mac or Linux, download the binary executable here link (right click and save target as file). Then put the binary under code/5_2_HONVis. Lastly, navigate to code/5_2_HONVis in Terminal (or your favorite shell) and run the following: