Recent Readings

Mercator: A Scalable, Extensible Web Crawler
Wednesday, March 22, 2006

Even though 1999 was a long time, this paper on building a web crawler seems like a nice introduction to the problem. The authors limn the various challenges in building any crawler and the additional ones that come from building one that can handle the ever growing World Wide Web. They also describe many of the extensions that needed to add to Java in order to support the very large data structures required. There is even mention of bloom filters.

All in all, a nice ride for the train or bus and one that leaves me wondering "why not do this in Lisp?" Would it scale as well? Would it be easier to build? Maintain? Extend? I'd like to think so...

