opening it up with Common Lisp
Book review: Darwinia
Summer reading: Spin
the Omnivoire's Delimma
the Golem's Eye
The Structure and Function of Complex Networks
Anyone interested in the science of networks should run, not walk, to their keyboard and touch type, not hunt and peck, for this paper right away. Mark Newman is not only an excellent physicist and mathematician, he also writes well (he also appears to be both young and handsome -- some people have all the luck). This review explains why networks are important, the techniques we have for classifying them, and the models we have to explain their structure, their growth and the processes that happen on them. It accomplishes all this in 74-beautifully written pages and includes an excellent bibliography (429-references!).
Mark Newman's review consists of 9-sections of varying length and difficulty. Some I found very easy to understand; others required more math than I can currently marshall. Most of the math can be skipped, however, without degrading your understand of the basic ideas of networks. I'll cover each section briefly in turn.
The introduction consists of a brief summary of the kinds of networks (directed, undirected and so forth) and explains how terminology differs between mathematicians, computer scientists, physicists and sociologists -- all of whom have brought much to the study of networks in the real world.
Networks in the Real World
Like the air we breath, networks have always been around us even though we've only started to pay attention to them quite recently. Networks -- collections of things connected to one another -- show up in our social relations (friendships and sexual contacts), our modern life (the power grid and the physical internet come to mind), nature (prey/predator webs and protein interaction networks), and information (citation graphs and that little thing called the world wide web). Surprisingly all of these very different networks, even though they have been formed by very different processes (many of which we do not even understand), are quite similar at heart.
Properties of Networks
As he says in his conclusion, "we have as yet no theoretical framework to tell us if we are even looking in the right place." We're like the drunk looking for his keys under the lamp post because that's where the light is. On the positive side, however, networks have many statistical properties that we can measure: average vertex degree, vertex degree distribution, number and size of connected components, number of triangles, squares or other cycles, number of cliches, clustering coefficients and so forth. These tell us much about a network and how it will behave under certain kinds of transformations. What they don't tell us is why they are the way they are! For that, we need models of how networks form and change.
The simplest and oldest model of networks is due to the independent work of Rapoport and Erdos & Renyi. They wondered what a graph would look like if it had n vertexes and each pair was connected with probability p; they also looked at graphs with n vertexes and m edges with the edges randomly connecting the vertexes. These two ensembles of graphs are known as G(n, p) and G(n, m). They share many of the same properties and n grows large. Two of the most interesting properties are that the degree distribution of the vertexes (i.e., the histogram that counts the number of vertexes with no edges, one edge, two edges and so forth) has the Poisson distribution and that the graphs undergo a phase transition between a graph having many, many separate components and a graph having (essentially) one single giant component.
The great thing about random models like this is that they are relatively easy to analysis. There isn't any of that nasty structure and interdependence to get in the way! The bad thing is that real networks just don't look like random graphs.
Other Models for Graphs
Several extensions to the basic random graph model have been made. The goal is to find more realistic models without sacrificing analytic results. For example, there are models that do provide a power law degree distribution and Exponential Random graphs and Markov graphs even manage to model transitivity correctly (i.e., that the likelihood of an edge between A and C depends on the existence of edges between A and B and between B and C). Unfortunately, the math necessary to understand these models remains unknown.
A less sophisticated but easier to analysis model is the small-world model of Watts and Strogatz. Here we build the network on a framework lattice and then rewire the network (or add more edges) to produce a small world. In spite of, or because of, its simplicity, this is a much studied and much extended model.
One thing all of the models mentioned so far lack is that they are essentially static -- edges may change a bit but the network does not grow. We will remedy this directly.
Models of Network Growth
In 1965 Derek de Solla Price developed a model for citation networks that fell victim to the lack of computational power -- he could analysis his model but was unable to say much about it. Thirty four years later, Barabasi and Albert independently arrived at much the same model. Now however, they could simulate it and that made all the difference.
Both models rely on an observation of Herbet Simon which is known as the Matthew effect in sociology: "Every one that has shall be given more...". Or, more colloquially, "The rich get richer". If you grow a network by added vertexes one at a time and adding a new link from each vertex to some existing vertex where the odds of being linked to are proportional to the current number of links you have, then you get a small world network with power law degree distribution and a reasonable clustering coefficient. This model seems like a reasonable approximation of citation networks but doesn't make much sense for others like the world wide web.
To remedy this, many other models have been proposed and examined with greater or lesser degrees of success. Some of the most interesting ones are highly dynamic with both edges and vertexes being added and removed. Of course, all this examination of network structure is being done for a purpose: to understand how it effects what happens on the network.
Processes Taking Place on the Network
In general, this has been a much harder problem than the examination of structural issues -- perhaps because it's hard to examine the processes without first understanding the structures. The major advances have been in understanding network failures, disease transmission, searching and constraint satisfaction problems.
The percolation model was one of the earliest to receive attention. In this model, vertexes or edges are randomly selected to have (or not have) some property and then the property is transmitted through the network in some fashion. Examples include studying how failures can cascade and how diseases or ideas can spread.
Another popular network process is that of search. This involves both the crawling of a network (either to cache information for later use or to search in real-time) and understanding how to rank results returned by a search. Google's Page Rank is perhaps the best known recursive definition out there but a more sophisticated Hubs and Authorities definition due to Kleinberg is also receiving lots of interest.
Newman also touches on several other interesting processes including guided search, general navigation (what kinds of networks let you use only local information to acheive globally good results); phase transitions, random walks and discrete dynamical processes. Of particular note is the idea that the transition between P and NP may be the result of a phase transition in a network -- since we can reduce NP problems to graph coloring, this may actually lead somewhere.
Summary and Directions for Future Research
Even my short summary of Newman's review shows what a fertile field networks have become. We have many statistical measures, many models of graph formation and many models of how graph structures effect graph processes. This isn't enough yet. We need more measures, more models of formation and processes and more understanding between the measures, the models and the processes. One of the most exciting aspects of all the work is that, contrary to Hardy, this is pure math and remarkably applied.
Copyright -- Gary Warren King, 2004 - 2006