Using Linear Algebra for Intelligent Information Retrieval Berry et. al. present a nice example study of how a little linear algebra (OK, it's a lot of linear algebra) can go a long way. Their example shows how Latent Semantic Indexing (LSI) discovers the non-lexical connections between words based on their context (see this older posting too). LSI proceeds as follows:
Amazingly, that's it -- except for interpreting the results! SVD is similar to principle component analysis (PCA) in that it can be used to reduce the total number of dimensions under study. The resulting matrixes breakdown the original relationships into linearly independent factors and we can use the k-largest ones to produce best estimates with less computation. The authors go on to discuss fast methods of computing and updating SVD matrixes and present a laundry list of applications including information retrieval, information filtering, cross-language retrieval, modeling human memory and dealing with noisy inputs. The best thing about this technical report is that it carefully goes through the mathematical steps with good examples, tables and charts to make the path clear. You don't find this often in published papers because the scientific method is supposed to brush all the work under the rug or behind the bed so everything looks pristine when the guests come to visit. |
