opening it up with Common Lisp
Book review: Darwinia
Summer reading: Spin
the Omnivoire's Delimma
the Golem's Eye
Suspicion scoring based on guilt-by-association, collective inference and focused data access
This is a short paper with a long title! Traditional machine learning based classification works with instances -- think of rows in a spreadsheet. The goal is to take training instances and produce a rule or set of rules that will probably classify future instances. This paper, however, is not traditional. It is one of a recent (i.e., within the last 5 to 10 years) crop of papers that understand that instances are related to one another. Guilt-by-association is not an instance based classifier. It is a relational one. My guilt depends on the guilt of the people I know and their guilt depends on mine. Relational classifiers are like Google's page rank or Kleinberg's Hubs and Authorities: beautifully recursive.
Collective Inference -- the simultaneous analysis of multiple related instances -- is another new term (popularized by David Jensen among others). Perhaps surprisingly, collective inference can often do better than a more traditional, one step at a time approach.
Macskassy and Provost's paper combines relational classification, collective inference and the focused and dynamic acquisition of new data. They present their system and the tools behind it; show it working on multiple simulator created data sets and analyze the results. The most interesting bit is one of the things that doesn't happen: they found that adding additional profiling data (i.e., adding more data about how suspicious instances are) did not help the classification algorithm in general. Instead, the known labels essentially washed the extra data away. If a result like this can be understood we'll have a better sense of when -- and why -- profiling does -- and doesn't work. That would be something very worth having!
Copyright -- Gary Warren King, 2004 - 2006