opening it up with Common Lisp
Book review: Darwinia
Summer reading: Spin
the Omnivoire's Delimma
the Golem's Eye
This is a beautiful little paper — one of those you wish you'd written because its such an obvious idea (at least in hind sight). Liben-Nowell and Kleinberg use a wide variety of topological measures to try and predict the links you'll see in the future based on the current state of a graph. For example, if I know all of the authors that collaborated on a paper over the last three years, how good a job can I do predicting which new collaborations I expect to see this year.
The two allow only the use of structural graph information — neither vertex nor edge attributes need apply. This is stringent but makes for interesting work. How much does the topology allow you to predict what to expect? They use measures based on path distance, neighbor vertexes, random walks, PageRank and similarity ranks. They compare these to a random predictor and use the two simplest measures (path length and common neighbors) as benchmarks. For data, they use the physics pre-print archives.
Not surprisingly, it is easy to do better than the random predictor. More surprisingly is that it is hard to do all that much better than common neighbors. Even when a more complex measure does score higher, it is often only on some of the data sets!
The paper points in a number of fascinating directions — many of which I didn't even know existed: Latent Semantic Analysis, unseen bigram estimation and the estimation of distribution support are only three of these.
Future work includes making some of these more complex measures faster and, I would think, seeing how much better you can do if you take attribute information into account.
Well written, pithy, great links! All in all, I'd highly recommend this paper.
Copyright -- Gary Warren King, 2004 - 2006