pathpy
1.0
pathpy is an OpenSource python package for the modeling and analysis of pathways and temporal networks using higher-order and multi-order graphical models
|
Public Member Functions | |
def | __init__ |
def | summary |
def | getSequence |
def | getUniquePaths |
def | __str__ |
def | readEdges |
def | readFile |
def | ObservationCount |
def | expandSubPaths |
def | addPathTuple |
def | getContainedPaths |
def | filterPaths |
def | projectPaths |
def | addPath |
def | getSlowDownFactor |
def | getEntropyGrowthRateRatio |
def | BetweennessPreference |
def | getNodes |
def | getDistanceMatrix |
def | getShortestPaths |
def | BetweennessCentrality |
def | ClosenessCentrality |
def | VisitationProbabilities |
Static Public Member Functions | |
def | fromTemporalNetwork |
Public Attributes | |
paths | |
A dictionary of paths that has the following structure: More... | |
separator | |
The character used to separate nodes on paths. | |
Instances of this class represent path statistics which can be analyzed using higher- and multi-order network models. The origin of the path statistics can be (i) n-gram files which provide us with a list of paths in terms of n-grams of varying lengths, or (ii) a temporal network instance which provides us with a set of time-respecting paths based on a given maximum time difference delta.
def pathpy.Paths.Paths.__init__ | ( | self | ) |
Creates an empty Paths object
def pathpy.Paths.Paths.__str__ | ( | self | ) |
Returns the default string representation of this Paths instance
def pathpy.Paths.Paths.addPath | ( | self, | |
ngram, | |||
separator = ' , |
|||
expandSubPaths = True , |
|||
pathFrequency = None |
|||
) |
Adds the path(s) of a single n-gram to the path statistics object. @param ngram: An ngram representing a path between nodes, separated by the separator character, e.g. the 4-gram a;b;c;d represents a path of length three (with separator ';') @param separator: The character used as separator for the ngrams (';' by default) @param expandSubPaths: by default all subpaths of the given ngram are generated, i.e. for the trigram a;b;c a path a->b->c of length two will be generated as well as two subpaths a->b and b->c of length one @weight weight: the weight (i.e. frequency) of the ngram
def pathpy.Paths.Paths.addPathTuple | ( | self, | |
path, | |||
expandSubPaths = True , |
|||
frequency = (0,1 |
|||
) |
Adds a tuple of elements as a path. If the elements are not strings, a conversion to strings will be made. This function can be used to to set custom subpath statistics, via the frequency tuple (see below). @path: The path tuple to be added, e.g. ('a', 'b', 'c') @expandSubPaths: Whether or not to calculate subpath statistics for this path @frequency: A tuple (x,y) indicating the frequency of this path as subpath (first component) and longest path (second component). Default is (0,1).
def pathpy.Paths.Paths.BetweennessCentrality | ( | self, | |
normalized = False |
|||
) |
Calculates the betweenness centrality of nodes based on observed shortest paths between all pairs of nodes
def pathpy.Paths.Paths.BetweennessPreference | ( | self, | |
k = 1 , |
|||
normalized = False , |
|||
method = 'MLE' |
|||
) |
Calculates the k-th order betweenness preferences of k-th order nodes based on the mutual information of path statistics of length k+1. The minimum order k for which betweenness preference can be computed is one, in which case for each first-order node v all paths s->v->d of length two will be considered for all nodes s and d. In the general case of order k, for a k-th order node v_1-...-v_{k} the statistics of all paths s-v_1-...v_{k-1} -> v_1-...-v_{k} -> v_2-...-v_{k}-d of length two in the k-th order network (i.e. length k+1) in the first-order network will be considered in the calculation. @order: The order of nodes for which to calculate betweenness preference @nornalized: whether or not to normalize betweenness preference values @method: which method to use for the entropy calculation. The default 'MLE' uses the standard Maximum-Likelihood estimation of entropy. Setting method to 'Miller' additionally applies a Miller-correction. see e.g. Liam Paninski: Estimation of Entropy and Mutual Information, Neural Computation 5, 2003 or http://www.nowozin.net/sebastian/blog/estimating-discrete-entropy-part-2.html
def pathpy.Paths.Paths.ClosenessCentrality | ( | self, | |
normalized = False |
|||
) |
Calculates the closeness centrality of nodes based on observed shortest paths between all nodes
def pathpy.Paths.Paths.expandSubPaths | ( | self | ) |
This function implements the sub path expansion, i.e. for a four-gram a,b,c,d, the paths a->b, b->c, c->d of length one and the paths a->b->c and b->c->d of length two will be counted.
def pathpy.Paths.Paths.filterPaths | ( | self, | |
node_filter, | |||
minLength = 0 , |
|||
maxLength = sys.maxsize |
|||
) |
Returns a new paths object which contains only paths between nodes in a given filter set. For each of the paths in the current Paths object, the set of maximally contained subpaths between nodes in node_filter is extracted. This method is useful when studying (sub-)paths passing through a subset of nodes. @param node_filter: the nodes for which paths with be extracted from the current set of paths @param minLength: the minimum length of paths to extract (default 0) @param maxLength: the maximum length of paths to extract (default sys.maxsize)
|
static |
Calculates the frequency of all time-respecting paths up to maximum length of k, assuming a maximum temporal distance of delta between consecutive time-stamped links on a path. This (static) method returns an instance of the class Paths, which can subsequently be used to generate higher-order network representations based on the path statistics. @param delta: Indicates the maximum temporal distance up to which time-stamped links will be considered to contribute to time-respecting paths. For (u,v;3) and (v,w;7) a time-respecting path (u,v)->(v,w) will be inferred for all 0 < delta <= 4, while no time-respecting path will be inferred for all delta > 4. If the max time diff is not set specifically, the default value of delta=1 will be used, meaning that a time-respecting path u -> v -> w will only be inferred if there are *directly consecutive* time-stamped links (u,v;t) (v,w;t+1). Every time-stamped edge is further considered a path of length one, i.e. for maxLength=1 this function will simply return the statistics of time-stamped edges. @param maxLength: Indicates the maximum length up to which time-respecting paths should be calculated, which can be limited due to computational efficiency. A value of k will generate all time-respecting paths consisting of up to k time-stamped links. Note that generating a multi-order model with a maximum order of k requires to extract time-respecting paths with *at least* length k. If a limitation of the maxLength is not required for computational reasons, this parameter should not be set (as it will change the statistics of paths)
def pathpy.Paths.Paths.getContainedPaths | ( | p, | |
node_filter | |||
) |
Returns the set of maximum-length sub-paths of the path p, which only contain nodes that appear in the node_filter. As an example, for the path (a,b,c,d,e,f,g) and a node_filter [a,b,d,f,g], the method will return [(a,b), (d,), (f,g)]. @param p: a path tuple to check for contained paths @param node_filter: a set of nodes to which the contained paths should be limited
def pathpy.Paths.Paths.getDistanceMatrix | ( | self | ) |
Calculates shortest path distances between all pairs of nodes based on the observed shortest paths (and subpaths)
def pathpy.Paths.Paths.getEntropyGrowthRateRatio | ( | self, | |
method = 'MLE' , |
|||
k = 2 , |
|||
lanczosVecs = 15 , |
|||
maxiter = 1000 |
|||
) |
Computes the ratio between the entropy growth rate ratio between the k-order and first-order model of a temporal network t. Ratios smaller than one indicate that the temporal network exhibits non-Markovian characteristics
def pathpy.Paths.Paths.getNodes | ( | self | ) |
Returns the list of nodes for the underlying set of paths
def pathpy.Paths.Paths.getSequence | ( | self, | |
stopchar = '|' |
|||
) |
Returns a single sequence in which all paths have been concatenated. Individual paths are separated by a stop character. @stopchar: The character used to separate paths
def pathpy.Paths.Paths.getShortestPaths | ( | self | ) |
Calculates all observed shortest paths (and subpaths) between all pairs of nodes
def pathpy.Paths.Paths.getSlowDownFactor | ( | self, | |
k = 2 , |
|||
lanczosVecs = 15 , |
|||
maxiter = 1000 |
|||
) |
Returns a factor S that indicates how much slower (S>1) or faster (S<1) a diffusion process evolves in a k-order model of the path statistics compared to what is expected based on a first-order model. This value captures the effect of order correlations of length k on a diffusion process which evolves based on the observed paths.
def pathpy.Paths.Paths.getUniquePaths | ( | self, | |
l = -1 |
|||
) |
Returns the number of unique paths up to a given length l. For the default value of l=-1 paths of any length will be counted. @param l: the (inclusive) maximum length up to which path shall be counted.
def pathpy.Paths.Paths.ObservationCount | ( | self | ) |
Returns the total number of observed pathways of any length (includes multiple observations for paths with a frequency weight)
def pathpy.Paths.Paths.projectPaths | ( | self, | |
mapping | |||
) |
Returns a new path object in which nodes have been mapped to different labels given by an arbitrary mapping function. For instance, for the mapping {'a': 'x', 'b': 'x', 'c': 'y', 'd': 'y'} the path (a,b,c,d) is mapped to (x,x,y,y). This is useful, e.g., to map page page click streams to topic click streams, using a mapping from pages to topics. @param mapping: a dictionary that maps nodes to the new labels
def pathpy.Paths.Paths.readEdges | ( | filename = None , |
|
separator = ' , |
|||
weight = False , |
|||
undirected = False , |
|||
maxlines = _sys.maxsize , |
|||
expandSubPaths = True |
|||
) |
Reads data from a file containing multiple lines of *edges* of the form "v,w,frequency,X" (where frequency is optional and X are arbitrary additional columns). The default separating character ',' can be changed. In order to calculate the statistics of paths of any length, by default all subpaths of length 1 (i.e. single nodes) contained in an edge will be considered.
def pathpy.Paths.Paths.readFile | ( | filename = None , |
|
separator = ' , |
|||
pathFrequency = False , |
|||
maxlines = _sys.maxsize , |
|||
maxN = _sys.maxsize , |
|||
expandSubPaths = True |
|||
) |
Reads path data from a file containing multiple lines of n-grams of the form "a,b,c,d,frequency" (where frequency is optional). The default separating character ',' can be changed. Each n-gram will be interpreted as a path of length n-1, i.e. bigrams a,b are considered as path of length one, trigrams a,b,c as path of length two, etc. In order to calculate the statistics of paths of any length, by default all subpaths of length k < n-1 contained in an n-gram will be considered. I.e. for n=4 the four-gram a,b,c,d will be considered as a single (longest) path of length n-1 = 3 and three subpaths a->b, b->c, c->d of length k=1 and two subpaths a->b->c amd b->c->d of length k=2 will be additionally counted. @param filename: name of the n-gram file to read data from @param separator: the character used to separate nodes on the path, i.e. using a separator character of ';' n-grams are represented as a;b;c;... @param pathFrequency: if set to true, the last entry in each n-gram will be interpreted as weight (i.e. frequency of the path), e.g. a,b,c,d,4 means that four-gram a,b,c,d has weight four. False by default, which means each path occurrence is assigned a default weight of one (adding weights of multiple occurrences). @param maxlines: The maximum number of lines (i.e. ngrams) to read from the input file @param maxN: The maximum n for the n-grams to read, i.e. setting maxN to 15 will ignore all n-grams of length 16 and longer, which means that only paths up to length n-1 are considered. @param expandSubPaths: by default all subpaths of the given ngrams are generated, i.e. for an input file with a single trigram a;b;c a path a->b->c of length two will be generated as well as two subpaths a->b and b->c of length one
def pathpy.Paths.Paths.summary | ( | self | ) |
Returns a string containing basic summary info of this Paths instance
def pathpy.Paths.Paths.VisitationProbabilities | ( | self | ) |
Calculates the probabilities that randomly chosen paths pass through nodes. If 5 out of 100 paths (of any length) contain node v, it will be assigned a value of 0.05. This measure can be interpreted as path-based ground truth for the notion of importance captured by PageRank applied to a graphical abstraction of the paths.
pathpy.Paths.Paths.paths |
A dictionary of paths that has the following structure: