Models of Network FormationEdit
This section will discuss Models of Network Formation also referred to as “Generative Network Models.” These models offer an explanation to the question of why a network should have a certain set of predetermined parameters (e.g. power-law degree distributions). That is, they model the mechanisms by which networks are created and explore the resultant network structures of certain hypothesized generative mechanisms. “If the structures are similar to those of networks we observe in the real world, it suggests-though does not prove-that similar generative mechanisms may be at work in real networks.” (Newman, 2010)
Many networks are observed to have degree distributions that approximately follow power laws, at least in the tail of the distribution. In the 1970s, Derek Price first proposed a network formation model that explained this observation which was inspired by the work of economist Herbert Simon and statistician Udny Yule (Yule Process). Price expanded Simon’s mathematical explanation of how the “rich-get-richer” (also a power-law distribution) to the network context, calling it “cumulative advantage” (more well known as preferential attachment coined by Barabasi and Albert in 1999).
Price specifically applied this process to a model of a citation network. In this model, papers are published continually (added one by one), yet not necessarily at a constant rate and newer papers cite older papers. The edges in this network are directional and are created but never destroyed. The central assumption of Price’s model is that a newly appearing paper cites previous ones chosen at random with probability proportional to the number of citations those previous papers already have plus a constant a, where a>0. Consider a price model network where denotes the in-degree of vertex i. Now suppose a new vertex is added to the network. Then the probability of the citation made to a particular vertex, say i is proportional to . Normalizing the above term, we get the probability of citing a vertex, say i by a newly introduced vertex is given by,
If average number of citations for a given paper is c, with certain variance according to the properties of the network then as as the network size becomes very large, the fraction of vertices in the network that have in-degree q becomes: (Newman, 2010 pp- 490-494)
c=average out-degree of network (or average bibliography size)
where beta function is defined as
and gamma function as
Furthermore, the Beta function falls off as a power law for large values of x, with exponent y. Applying this, we find that for large values of q (q>>a) the degree distribution goes to:
Thus Price’s model for citation network gives rise to a degree distribution with a power-law tail. More generally speaking, each paper could be considered a node and each citation a link. Here, links are added to existing nodes at a probability proportional to the number of links already reaching that node.
Model of Barabasi and AlbertEdit
Developed independently in the 1990s, the Barabasi-Albert model is the best known generative network model in use today. Similar to Price’s model, vertices are added one by one to a growing network and connected to existing vertices. However, these connections are undirected and the number of connections made by each vertex is exactly c (unlike Price’s model where c is allowed to vary from step to step). Connections are made in proportion to the undirected degree, k. By treating the Barabasi and Albert model as a special case of Price’s model, Newman, 2010 shows that:
for k ≥ c
c=number of connections made by each vertex
k=degree of vertex
is the proportion of vertices with k degrees.
In this case, when k becomes very large the degree distribution becomes:
Extensions and Further Properties of Preferential Attachment ModelsEdit
|Model Property/Extension||Intuitive Rationale||Resultant Distribution||Contributor|
|Incorporating Time of Creation||By preferential attachment, older vertices will have more time to acquire links.||Contains leading algebraic factor, does not follow power-law distribution||(Dorogovsev, Mendes, & Samukhim, 2000)|
|Sizes of In-Components||Distribution of the set of vertices from which vertex i can be reached by following a directed path||For component sizes << network size, follows power-law distribution||(Newman, 2010)|
|Addition of Extra Edges||Connections added between two existing vertices||Power-law distribution||(Krapivsky, Rodgers, & Redner, 2001)|
|Removal of Edges||Considered reverse preferential attachment, higher degree are more likely to lose an edge||Power-law distribution to a point, then stretched exponential||(Moore, Ghoshal, & Newman, 2006)|
|Non-Linear Preferential Attachment||Considers a non-linear attachment process rather than a linear process in the degree of the vertex||Stretched exponential||(Krapivsky, Redner, & Leyvraz, Connectivity of Growing Rrandom Networks, 2000)|
|Vertices of Varying Quality or Attractiveness||Quality and attractiveness are incorporated into attachment process||Depends on distribution of “fitness” values||(Bianconi & Barabasi, 2001)|
What other networks’ formation could be accurately described using preferential attachment (considering either Price’s model or Barabasi-Albert)?
Vertex Copying ModelsEdit
Though preferential treatment offers a plausible explanation for power-law degree distributions, it is not the only means by which networks can grow (Newman, 2010). Suppose newly added vertices copied a fraction of the connections of an existing vertex and the remainders are filled out by other existing vertices (e.g. a new paper copied a fraction the bibliography of an existing paper). As analyzed by Newman, 2010, the degree distribution of a large network under this model will asymptotically follow a power-law distribution (Newman, 2010).
Network Optimization ModelsEdit
Previous network structures are based, for the most part, on successive random processes, which are blind to the large-scale structures that are being created. A network growth mechanism that may be more suited to transportation networks is structural optimization. The optimization in this case involves a trade-off between travel time and cost. One of the simplest forms of this type of mechanism was developed by Ferrer i Cancho and Sole in 2003. This model seeks to minimize the following quality function (Ferrer i Cancho & Sole, 2003):
e=number of edges in a network
l=mean geodesic distance between vertex pairs (dissatisfaction measure)
λ=a parameter in the range 0≤λ≤1
In this model, the cost of maintaining the network is represented by the variable m. This is the same as saying the cost of operating an airline is proportional to the number of routes it operates, for example. Following the airline example, the variable l would be the average number of legs required to journey from one point to another. This is obviously a simplification of a complex network. The parameter λ provides a balance between a network with the minimum number of edges possible and a fully connected network. From this model it can be seen that by placing a moderate weight on λ, l is minimized and the optimal result is a star graph. This offers a simple explanation of why the hub-and-spoke system is so efficient (Newman, 2010). As it turns out, a non-star graph solution only appears when: .
The degree distributions of this model, as determined by Ferrer i Cancho and Sole, ranged from an exponential distribution to a power-law distribution as the value of λ changes. This was proposed only for local minima.
Gastner and Newman, 2006 generalizes the model presented by Ferrer i Cancho and Sole by considering not only the number of legs in a journey but also the geographic distance traveled (Newman, 2010). The term l is replaced by t, where t=u+vr (Newman, 2010). In this expression u and v are constants associated with a fixed time and pace (1/speed) on a link and r is the distance traveled on the link. For example, u could be considered the check in, waiting, taxiing, etc. time at an airport and pace would be the reciprocal of the average velocity traveled during a flight. This introduces a spatial aspect into the model where before only topological considerations were accounted for. However, this model also produces only numerical results.
By changing the values of u and v in Gastner and Newman’s model, what assumptions could be drawn about the geometry of the resultant network?
A Survey of Models of Network Formation: Stability and Efficiency by Matthew O. Jackson
A General Theory of Bibliometric and Other Cumulative Advantage Processes by Derek de S. Price
The Emergence of Hierarchy in Transportation Networks by Bhanu M. Yerra and David M. Levinson
Barabasi, A.-L., & Albert, R. (1999). Emergence of Scaling in Random Networks. Science , 286, 509-512.
Bianconi, G., & Barabasi, A.-L. (2001). Bose-Einstein Condensation in Complex Networks. Physics Review Letters , 86, 5632-5635.
Dorogovsev, S. N., Mendes, J. F., & Samukhim, A. N. (2000). Structure of Growing Networks with Preferential Linking. Physics Review Letters , 85, 4633-4636.
Ferrer i Cancho, R., & Sole, R. V. (2003). Statistical Mechanics of Complex Networks (Vol. 625). (R. Pastor-Satorras, M. Rubi, & A. Diaz-Guilera, Eds.) Berlin: Springer.
Gastner, M. T., & Newman, M. E. (2006). Optimal Design of Spatial Distribution Networks. Physics Review E , 74, 016117.
Jackson, M. O. (2005). A Survey of Models of Network Formation: Stability and Efficiency. In G. Demange, & M. Wooders (Eds.), Group Formation in Economics: Networks, Clubs, and Coalitions (p. Chapter 1). Cambridge: Cambridge University Press.
Krapivsky, P. L., Redner, S., & Leyvraz, F. (2000). Connectivity of Growing Rrandom Networks. Physics Review Letters , 85, 4629-4632.
Krapivsky, P. L., Rodgers, G. J., & Redner, S. (2001). Degree Distributions of Growing Networks. Physics Review Letters , 86, 5401-5404.
Moore, C., Ghoshal, G., & Newman, M. E. (2006). Exact Solutions of Models of Evolving Networks with Addition and Deletion of Nodes. Physics Review E , 74 (3), 036121.
Newman, M. E. (2010). Networks: An Introduction. New York, NY: Oxford University Press Inc.
Price, D. d. (1976). A General Theory of Bibliometric and Other Cumulative Advantage Processes. Journal of the American Society for Information Science , 27, 292-306.
Rodrigue, J.-P., Comtois, C., & Slack, B. (2009). The Geography of Transport Systems (2nd Edition ed.). New York, NY: Routledge.
Simon, H. A. (1955). On a Class of Skew Distribution Functions. Biometrika , 42 (3/4), 425-440.
Wilensky, U. (2005). NetLogo Preferential Attachment model. http://ccl.northwestern.edu/netlogo/models/PreferentialAttachment. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL
Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.
Yerra, B. M., & Levinson, D. M. (2005). The Emergence of Hierarchy in Transportation Networks. Annals of Regional Science , 39, 541-553.