opening it up with Common Lisp

Favorite weblogs

Lisp Related

Bill Clementson

Finding Lisp


Planet Lisp



Talking Points Memo

This Modern World

Working for Change

Other home


Recent Readings

Book review: Darwinia
Reviewed: Friday, August 11, 2006

Summer reading: Spin
Reviewed: Saturday, August 5, 2006

Reviewed: Tuesday, July 18, 2006

the Omnivoire's Delimma
Reviewed: Wednesday, July 12, 2006

the Golem's Eye
Reviewed: Wednesday, May 31, 2006


Suspicion scoring based on guilt-by-association, collective inference and focused data access
Sofus A. Macskassy and Foster Provost, 2005 , (Paper URL)
Monday, May 23, 2005

This is a short paper with a long title! Traditional machine learning based classification works with instances -- think of rows in a spreadsheet. The goal is to take training instances and produce a rule or set of rules that will probably classify future instances. This paper, however, is not traditional. It is one of a recent (i.e., within the last 5 to 10 years) crop of papers that understand that instances are related to one another. Guilt-by-association is not an instance based classifier. It is a relational one. My guilt depends on the guilt of the people I know and their guilt depends on mine. Relational classifiers are like Google's page rank or Kleinberg's Hubs and Authorities: beautifully recursive.

Collective Inference -- the simultaneous analysis of multiple related instances -- is another new term (popularized by David Jensen among others). Perhaps surprisingly, collective inference can often do better than a more traditional, one step at a time approach.

Macskassy and Provost's paper combines relational classification, collective inference and the focused and dynamic acquisition of new data. They present their system and the tools behind it; show it working on multiple simulator created data sets and analyze the results. The most interesting bit is one of the things that doesn't happen: they found that adding additional profiling data (i.e., adding more data about how suspicious instances are) did not help the classification algorithm in general. Instead, the known labels essentially washed the extra data away. If a result like this can be understood we'll have a better sense of when -- and why -- profiling does -- and doesn't work. That would be something very worth having!

Home | About | Quotes | Recent | Archives

Copyright -- Gary Warren King, 2004 - 2006