pathpy  1.0
pathpy is an OpenSource python package for the modeling and analysis of pathways and temporal networks using higher-order and multi-order graphical models
 All Classes Functions Variables
Public Member Functions | Static Public Member Functions | Public Attributes | List of all members
pathpy.Paths.Paths Class Reference

Public Member Functions

def __init__
 
def summary
 
def getSequence
 
def getUniquePaths
 
def __str__
 
def readEdges
 
def readFile
 
def ObservationCount
 
def expandSubPaths
 
def addPathTuple
 
def getContainedPaths
 
def filterPaths
 
def projectPaths
 
def addPath
 
def getSlowDownFactor
 
def getEntropyGrowthRateRatio
 
def BetweennessPreference
 
def getNodes
 
def getDistanceMatrix
 
def getShortestPaths
 
def BetweennessCentrality
 
def ClosenessCentrality
 
def VisitationProbabilities
 

Static Public Member Functions

def fromTemporalNetwork
 

Public Attributes

 paths
 A dictionary of paths that has the following structure: More...
 
 separator
 The character used to separate nodes on paths.
 

Detailed Description

Instances of this class represent path statistics which can be analyzed using higher- and multi-order network
models. The origin of the path statistics can be (i) n-gram files which provide us with a list of paths 
in terms of n-grams of varying lengths, or (ii) a temporal network instance which provides us with a set of
time-respecting paths based on a given maximum time difference delta.

Constructor & Destructor Documentation

def pathpy.Paths.Paths.__init__ (   self)
Creates an empty Paths object

Member Function Documentation

def pathpy.Paths.Paths.__str__ (   self)
Returns the default string representation of 
this Paths instance
def pathpy.Paths.Paths.addPath (   self,
  ngram,
  separator = ',
  expandSubPaths = True,
  pathFrequency = None 
)
Adds the path(s) of a single n-gram to the path statistics object.

@param ngram: An ngram representing a path between nodes, separated by the separator character, e.g. 
    the 4-gram a;b;c;d represents a path of length three (with separator ';')

@param separator: The character used as separator for the ngrams (';' by default)

@param expandSubPaths: by default all subpaths of the given ngram are generated, i.e. 
    for the trigram a;b;c a path a->b->c of length two will be generated 
    as well as two subpaths a->b and b->c of length one

@weight weight: the weight (i.e. frequency) of the ngram
def pathpy.Paths.Paths.addPathTuple (   self,
  path,
  expandSubPaths = True,
  frequency = (0,1 
)
Adds a tuple of elements as a path. If the elements are not strings, 
a conversion to strings will be made. This function can be used to 
to set custom subpath statistics, via the frequency tuple (see below).

@path: The path tuple to be added, e.g. ('a', 'b', 'c')
@expandSubPaths: Whether or not to calculate subpath statistics for this path
@frequency: A tuple (x,y) indicating the frequency of this path as subpath 
    (first component) and longest path (second component). Default is (0,1).
def pathpy.Paths.Paths.BetweennessCentrality (   self,
  normalized = False 
)
Calculates the betweenness centrality of nodes based on
observed shortest paths between all pairs of nodes
def pathpy.Paths.Paths.BetweennessPreference (   self,
  k = 1,
  normalized = False,
  method = 'MLE' 
)
Calculates the k-th order betweenness preferences of 
k-th order nodes based on the mutual information of path 
statistics of length k+1. The minimum order k for which 
betweenness preference can be computed is one, in which 
case for each first-order node v all paths s->v->d of length 
two will be considered for all nodes s and d. In the general case of 
order k, for a k-th order node v_1-...-v_{k} the statistics 
of all paths s-v_1-...v_{k-1} -> v_1-...-v_{k} -> v_2-...-v_{k}-d
of length two in the k-th order network (i.e. length k+1) in the first-order
network will be considered in the calculation.

@order: The order of nodes for which to calculate betweenness preference

@nornalized: whether or not to normalize betweenness preference values

@method: which method to use for the entropy calculation. The default 'MLE' uses 
    the standard Maximum-Likelihood estimation of entropy. Setting method to 
    'Miller' additionally applies a Miller-correction. see e.g. 
    Liam Paninski: Estimation of Entropy and Mutual Information, Neural Computation 5, 2003 or 
    http://www.nowozin.net/sebastian/blog/estimating-discrete-entropy-part-2.html
def pathpy.Paths.Paths.ClosenessCentrality (   self,
  normalized = False 
)
Calculates the closeness centrality of nodes based on
observed shortest paths between all nodes 
def pathpy.Paths.Paths.expandSubPaths (   self)
This function implements the sub path expansion, i.e. 
for a four-gram a,b,c,d, the paths a->b, b->c, c->d of 
length one and the paths a->b->c and b->c->d of length 
two will be counted.
def pathpy.Paths.Paths.filterPaths (   self,
  node_filter,
  minLength = 0,
  maxLength = sys.maxsize 
)
Returns a new paths object which contains only paths between nodes in a given 
filter set. For each of the paths in the current Paths object, the set of maximally 
contained subpaths between nodes in node_filter is extracted. This method is useful 
when studying (sub-)paths passing through a subset of nodes.

@param node_filter: the nodes for which paths with be extracted from the current
    set of paths
@param minLength: the minimum length of paths to extract (default 0)
@param maxLength: the maximum length of paths to extract (default sys.maxsize)
def pathpy.Paths.Paths.fromTemporalNetwork (   tempnet,
  delta = 1,
  maxLength = _sys.maxsize 
)
static
Calculates the frequency of all time-respecting paths up to maximum length of k, assuming 
a maximum temporal distance of delta between consecutive time-stamped links on a path. 
This (static) method returns an instance of the class Paths, which can subsequently be used to 
generate higher-order network representations based on the path statistics.

@param delta: Indicates the maximum temporal distance up to which time-stamped links will be 
considered to contribute to time-respecting paths. For (u,v;3) and (v,w;7) a time-respecting path (u,v)->(v,w) 
will be inferred for all 0 < delta <= 4, while no time-respecting path will be inferred for all delta > 4. 
If the max time diff is not set specifically, the default value of delta=1 will be used, meaning that a
time-respecting path u -> v -> w will only be inferred if there are *directly consecutive* time-stamped 
links (u,v;t) (v,w;t+1). Every time-stamped edge is further considered a path of length one, i.e. for maxLength=1 
this function will simply return the statistics of time-stamped edges.

@param maxLength: Indicates the maximum length up to which time-respecting paths should be calculated, 
     which can be limited due to computational efficiency. A value of k will generate all time-respecting paths 
     consisting of up to k time-stamped links. Note that generating a multi-order model with a maximum order of k 
     requires to extract time-respecting paths with *at least* length k. If a limitation of the maxLength is not 
     required for computational reasons, this parameter should not be set (as it will change the statistics of 
     paths)
def pathpy.Paths.Paths.getContainedPaths (   p,
  node_filter 
)
Returns the set of maximum-length sub-paths of the path p, which
only contain nodes that appear in the node_filter. As an example, 
for the path (a,b,c,d,e,f,g) and a node_filter [a,b,d,f,g], the method 
will return [(a,b), (d,), (f,g)].

@param p: a path tuple to check for contained paths
@param node_filter: a set of nodes to which the contained paths should be limited
def pathpy.Paths.Paths.getDistanceMatrix (   self)
Calculates shortest path distances between all pairs of 
nodes based on the observed shortest paths (and subpaths)
def pathpy.Paths.Paths.getEntropyGrowthRateRatio (   self,
  method = 'MLE',
  k = 2,
  lanczosVecs = 15,
  maxiter = 1000 
)
Computes the ratio between the entropy growth rate ratio between
the k-order and first-order model of a temporal network t. Ratios smaller
than one indicate that the temporal network exhibits non-Markovian characteristics
def pathpy.Paths.Paths.getNodes (   self)
Returns the list of nodes for the underlying 
set of paths
def pathpy.Paths.Paths.getSequence (   self,
  stopchar = '|' 
)
Returns a single sequence in which all 
paths have been concatenated. Individual 
paths are separated by a stop character.

@stopchar: The character used to separate paths
def pathpy.Paths.Paths.getShortestPaths (   self)
Calculates all observed shortest paths (and subpaths) between 
all pairs of nodes
def pathpy.Paths.Paths.getSlowDownFactor (   self,
  k = 2,
  lanczosVecs = 15,
  maxiter = 1000 
)
Returns a factor S that indicates how much slower (S>1) or faster (S<1)
a diffusion process evolves in a k-order model of the path statistics
compared to what is expected based on a first-order model. This value captures 
the effect of order correlations of length k on a diffusion process which evolves 
based on the observed paths.
def pathpy.Paths.Paths.getUniquePaths (   self,
  l = -1 
)
Returns the number of unique paths up to a given length l. For the default 
value of l=-1 paths of any length will be counted. 

@param l: the (inclusive) maximum length up to which path shall be counted. 
def pathpy.Paths.Paths.ObservationCount (   self)
Returns the total number of observed pathways of any length 
(includes multiple observations for paths with a frequency weight)
def pathpy.Paths.Paths.projectPaths (   self,
  mapping 
)
Returns a new path object in which nodes have been mapped to different labels
given by an arbitrary mapping function. For instance, for the mapping 
{'a': 'x', 'b': 'x', 'c': 'y', 'd': 'y'} the path (a,b,c,d) is mapped to 
(x,x,y,y). This is useful, e.g., to map page page click streams to topic 
click streams, using a mapping from pages to topics.

@param mapping: a dictionary that maps nodes to the new labels
def pathpy.Paths.Paths.readEdges (   filename = None,
  separator = ',
  weight = False,
  undirected = False,
  maxlines = _sys.maxsize,
  expandSubPaths = True 
)
Reads data from a file containing multiple lines of *edges* of the
form "v,w,frequency,X" (where frequency is optional and X are arbitrary additional columns). The default separating 
character ',' can be changed. In order to calculate the statistics of paths of any length, 
by default all subpaths of length 1 (i.e. single nodes) contained in an edge will be considered.
def pathpy.Paths.Paths.readFile (   filename = None,
  separator = ',
  pathFrequency = False,
  maxlines = _sys.maxsize,
  maxN = _sys.maxsize,
  expandSubPaths = True 
)
Reads path data from a file containing multiple lines of n-grams of the 
form "a,b,c,d,frequency" (where frequency is optional). The default separating 
character ',' can be changed. Each n-gram will be interpreted as a path of length n-1, 
i.e. bigrams a,b are considered as path of length one, trigrams a,b,c as path of length two, etc.
In order to calculate the statistics of paths of any length, by default all subpaths of 
length k < n-1 contained in an n-gram will be considered. I.e. for n=4 the four-gram a,b,c,d 
will be considered as a single (longest) path of length n-1 = 3 and three subpaths 
a->b, b->c, c->d of length k=1 and two subpaths a->b->c amd b->c->d of length k=2 will be 
additionally counted.

@param filename: name of the n-gram file to read data from

@param separator: the character used to separate nodes on the path, i.e. using a 
    separator character of ';' n-grams are represented as a;b;c;...

@param pathFrequency: if set to true, the last entry in each n-gram will be interpreted as 
    weight (i.e. frequency of the path), e.g. a,b,c,d,4 means that four-gram a,b,c,d has weight four.
    False by default, which means each path occurrence is assigned a default weight of one (adding weights 
    of multiple occurrences).

@param maxlines: The maximum number of lines (i.e. ngrams) to read from the input file

@param maxN: The maximum n for the n-grams to read, i.e. setting maxN to 15 will ignore all n-grams of length 
    16 and longer, which means that only paths up to length n-1 are considered.

@param expandSubPaths: by default all subpaths of the given ngrams are generated, i.e. 
    for an input file with a single trigram a;b;c a path a->b->c of length two will be generated
    as well as two subpaths a->b and b->c of length one
def pathpy.Paths.Paths.summary (   self)
Returns a string containing basic summary info of this Paths instance
def pathpy.Paths.Paths.VisitationProbabilities (   self)
Calculates the probabilities that randomly chosen paths
pass through nodes. If 5 out of 100 paths (of any length) contain 
node v, it will be assigned a value of 0.05. This measure can be 
interpreted as path-based ground truth for the notion of importance 
captured by PageRank applied to a graphical abstraction of the paths.

Member Data Documentation

pathpy.Paths.Paths.paths

A dictionary of paths that has the following structure:

  • paths[k] is a dictionary containing all paths of length k, indexed by a path tuple p = (u,v,w,...)
  • for each tuple p of length k, paths[k][p] contains a tuple (i,j) where i refers to the number of times p occurs as a subpath of a longer path, and j refers to the number of times p occurs as a real or longest path (i.e. not being a subpath of a longer path)

The documentation for this class was generated from the following file: