opening it up with Common Lisp

Favorite weblogs

Lisp Related

Bill Clementson

Finding Lisp

Lemonodor

Lispmeister.com

Planet Lisp

Politics

Orcinus

Talking Points Memo

This Modern World

Working for Change

Other home

Polliblog

Recent Readings

Book review: Darwinia
Reviewed: Friday, August 11, 2006

Summer reading: Spin
Reviewed: Saturday, August 5, 2006

Runner
Reviewed: Tuesday, July 18, 2006

the Omnivoire's Delimma
Reviewed: Wednesday, July 12, 2006

the Golem's Eye
Reviewed: Wednesday, May 31, 2006





tinderbox

The link prediction problem for social networks
David Liben-Nowell and Jon Kleinberg, 2004 , (Paper URL)
Monday, August 16, 2004

This is a beautiful little paper — one of those you wish you'd written because its such an obvious idea (at least in hind sight). Liben-Nowell and Kleinberg use a wide variety of topological measures to try and predict the links you'll see in the future based on the current state of a graph. For example, if I know all of the authors that collaborated on a paper over the last three years, how good a job can I do predicting which new collaborations I expect to see this year.

The two allow only the use of structural graph information — neither vertex nor edge attributes need apply. This is stringent but makes for interesting work. How much does the topology allow you to predict what to expect? They use measures based on path distance, neighbor vertexes, random walks, PageRank and similarity ranks. They compare these to a random predictor and use the two simplest measures (path length and common neighbors) as benchmarks. For data, they use the physics pre-print archives.

Not surprisingly, it is easy to do better than the random predictor. More surprisingly is that it is hard to do all that much better than common neighbors. Even when a more complex measure does score higher, it is often only on some of the data sets!

The paper points in a number of fascinating directions — many of which I didn't even know existed: Latent Semantic Analysis, unseen bigram estimation and the estimation of distribution support are only three of these.

Future work includes making some of these more complex measures faster and, I would think, seeing how much better you can do if you take attribute information into account.

Well written, pithy, great links! All in all, I'd highly recommend this paper.


Home | About | Quotes | Recent | Archives

Copyright -- Gary Warren King, 2004 - 2006