opening it up with Common Lisp
Book review: Darwinia
Summer reading: Spin
the Omnivoire's Delimma
the Golem's Eye
An Introduction to Latent Semantic Analysis
Though many have believed that its popularity stems only from having a wonderful name, Latent Semantic Analysis (LSA) turns out to be both surprisingly useful and possibly an accurate representation of what goes on inside our heads. Landauer et. al. show this by summarizing a large body of research comparing LSA with humans on tasks such as categorization, estimating coherency, semantic priming and even scoring essays (!?).
LSA takes as input a matrix representing the occurrence of, for example, words in phrases or phrases in documents or, most broadly, things in collections. It uses singular value decomposition (SVD) to break this matrix into three: one representing the rows, one the columns and one diagonal matrix of "weights". This representation can then be compressed by reducing the number of matrix dimensions. The "distance" between words/phrases/things is then determined by looking at the compressed analogue of the original matrix. The decomposition and compression steps force the matrix to reveal the hidden connections between the things (hence, Latent Semantics).
As the authors say, you can treat LSA as a useful technique regardless of whether or not you believe the larger claim that it (or something very close to it) is actually how our brains function. They do, however, present an impressive array of evidence that LSA matches human performance pretty darn well.
Perhaps the most surprising part of LSA is that it works so well without taking syntax into account — all LSA looks at is inclusion of things within groups. The order of these things doesn't matter. I'd be interested in finding domains where LSA failed because syntax really was important. It would also be fun to look for incremental algorithms (and/or ones that could be reasonably implemented in wet ware). In any case, it's a technique I want to add to my toolbox (Lisp programs coming someday).
Copyright -- Gary Warren King, 2004 - 2006