opening it up with Common Lisp

Favorite weblogs

Lisp Related

Bill Clementson

Finding Lisp


Planet Lisp



Talking Points Memo

This Modern World

Working for Change

Other home


Recent Readings

Book review: Darwinia
Reviewed: Friday, August 11, 2006

Summer reading: Spin
Reviewed: Saturday, August 5, 2006

Reviewed: Tuesday, July 18, 2006

the Omnivoire's Delimma
Reviewed: Wednesday, July 12, 2006

the Golem's Eye
Reviewed: Wednesday, May 31, 2006


Using Linear Algebra for Intelligent Information Retrieval
Michael W. Berry and Susan T. Dumais and Gavin W. O'Brien, 1994 , (Paper URL)
Tuesday, September 7, 2004

Berry et. al. present a nice example study of how a little linear algebra (OK, it's a lot of linear algebra) can go a long way. Their example shows how Latent Semantic Indexing (LSI) discovers the non-lexical connections between words based on their context (see this older posting too). LSI proceeds as follows:

  1. First, create a matrix representing, for example, which words appear in which documents.

  2. Then use singular value decomposition (SVD) to split that matrix into three pieces: one representing the words, one the documents, and one the inter-relations. You usually also want to weight the words and documents for significance (this is where domain knowledge comes in handy)

  3. To query which documents are most similar to a new query, one turns the query into a vector representing which words are in it, scales it appropriately and then compares it to each document (using, for example, a cosine similarity measure).

Amazingly, that's it -- except for interpreting the results! SVD is similar to principle component analysis (PCA) in that it can be used to reduce the total number of dimensions under study. The resulting matrixes breakdown the original relationships into linearly independent factors and we can use the k-largest ones to produce best estimates with less computation.

The authors go on to discuss fast methods of computing and updating SVD matrixes and present a laundry list of applications including information retrieval, information filtering, cross-language retrieval, modeling human memory and dealing with noisy inputs.

The best thing about this technical report is that it carefully goes through the mathematical steps with good examples, tables and charts to make the path clear. You don't find this often in published papers because the scientific method is supposed to brush all the work under the rug or behind the bed so everything looks pristine when the guests come to visit.

Home | About | Quotes | Recent | Archives

Copyright -- Gary Warren King, 2004 - 2006