opening it up with Common Lisp
Book review: Darwinia
Summer reading: Spin
the Omnivoire's Delimma
the Golem's Eye
I've been using del.icio.us for a while now and like it. So far, most of my usage has been entirely personal; I use it as a bookmark repository and don't explore other people's tags or posts. The main reason for this is that I already have way too much to read and do and finding stuff to add to my lists doesn't look as if it's going to become a problem any time soon!
I do, however, have strong interests in folksonomy, ontology, and networking and I've wanted to try some simple visualizations of my data. This turns out to be a natural enough reason to let me write a brief tutorial of CL-Graph and CL-Containers. I used the del.icio.us API to get an XML file of all of my posts. I then used the XMLS package to parse the XML into this:
Next, I created a class to hold the information about a post (this is probably better done as a structure but I tend to use objects unless I really care about space and time). The defclass* macro is part of Metatilities and mainly exists to save typing all those :initforms and :readers and whatnot.
(defclass* delicious-post () ((post-time nil ia :initarg :time) (tags nil ia :initarg :tag) (hash nil ia) (extended nil ia) (description nil ia) (post-href nil ia :initarg :href)))
(defun determine-tag-counts (delicious-post-file) "Returns a list of tags and their counts from a delicious-post-file." (bind ((posts (xmls::parse delicious-post-file)) (tags (collect-elements ;; the first two elements of posts aren't tags (cddr posts) :transform (lambda (post-info) (let ((tags (find "tag" (second post-info) :test #'string-equal :key #'first))) (when tags (tokenize-string (second tags) :delimiter #\ ))))))) (element-counts (flatten tags) :test #'equal)))
The next bit o' code reads in the XML file and aggregates all the tags. The funny bits are bind, collect-elements, tokenize-string, flatten and element-counts. Collect-elements is sort of like mapcar but it works for any kind of container both Common Lisp ones like lists, vectors and hash-tables and CL-Container ones like red-black-trees, heaps, and stacks. Element-Counts is another CL-Container's method. It returns an associative list of each unique element (where uniqueness depends on the test) and the number of times it appears.
If I try determine-tag-counts on my posts, I get:
(("techology" 1) ("christmas" 1) ("parallelism" 1) ("mail-and-print" 1) ("alternative-energy" 1) ("review" 2) ("blog-this" 2) ("autism" 2) ("neurology" 1) ("alf" 1) ...)
But I probably really want to see the results sorted. If we change the call to element-counts by adding :sort #'> :sort-on :counts, then we'll get:
(("to-read" 134) ("book-to-read" 66) ("software-development" 24) ("computer-science" 20) ("software" 19) ("programming" 15) ("science" 9) ("been-read" 9) ("social-software" 9) ("unix" 9) ...)
Oh dear. Looks like I'm behind on my reading...
Visualizing Tag interconnections
Neither tags nor posts sit by themselves. If I made a graph with a vertex for each post and another one for each tag and then added a link between each post and its tags, I'd end up with a bipartite graph. To simplify, I could then project the bipartite graph down onto only its tags (or posts) to create a new graph whose vertexes were all tags (or posts) and where two tags (or posts, etc) were linked when they both share a post in the original graph. Here's how it would look in CL-Graph.
(defun create-bipartite-tag/post-graph (delicious-post-file) "Creates a bipartite graph of tags, posts and the links between them from a delicious post file." (bind ((posts (parse-delicious-posts delicious-post-file)) (g (cl-graph:make-graph 'cl-graph:graph-container))) (iterate-elements posts (lambda (post) (iterate-elements (tags post) (lambda (tag) (cl-graph:add-edge-between-vertexes g post tag))))) g))
Create-bipartite-tag/post-graph parses the XML as before and then makes a graph to hold on to them. Make-graph is a synonym for make-instance and here we make a graph-container (which is more or less a graph represented by an adjacency list but we don't have to worry about that). Iterate-elements is mapc as collect-elements was to mapcar. We use it to call add-edge-between-vertexes for each post and tag.
Now that we have the bipartite graph, we can use project-bipartite-graph to get one the unipartite projection on tags and then call graph->dot to see what it looks like:
(cl-graph:graph->dot (cl-graph:project-bipartite-graph (cl-graph:make-graph 'cl-graph:graph-container :default-edge-class 'cl-graph:weighted-edge) full-graph 'keyword (compose 'type-of 'element)) "user-home:temporary;all-tags.dot" :vertex-labeler (lambda (vertex stream) (format stream "~(~A~)" (symbol-name (element vertex)))) :edge-formatter (lambda (edge stream) (format stream "weight=~D" (cl-graph:weight edge))))
The project-bipartite-graph method takes as input graph to be created (either a symbol naming the class or an existing (presumably empty!) graph), the graph to project (full-graph in the example code), a value specifying which vertexes to project and a function that will be applied to each vertex in the original graph. In this case, I use the function built by composing #'element and #'type-of. When called on a vertex, this will return the type of whatever is contained in the vertex. The way I built the graph, this will be a keyword for tags and a delicious-post for posts.
Graph->dot has lots of parameters that can be used to control the output. Here, I use the vertex-labeler and the edge-formatter to specify a little bit of fanciness.
(Update 2005-11-28) As I learned to my chagrin, the output is huge (5.2 Megabyte). If you'd like to see it, you can click to open full size in another window.
That, however, is a bit of a mess (even enlarged). I'd really like to focus in on one or two tags and see what that looks like. I'll use make-filtered-graph to see only the tags that are linked to "lisp":
(cl-graph:graph->dot (cl-graph:make-filtered-graph (cl-graph:project-bipartite-graph (cl-graph:make-graph 'cl-graph:graph-container :default-edge-class 'cl-graph:weighted-edge) (create-bipartite-tag/post-graph #P"user-home:temporary;all-posts.xml") 'keyword (compose 'type-of 'element)) (lambda (v) (search "lisp" (symbol-name (element v)) :test #'string-equal)) :complete-closure-with-links 1) "user-home:temporary;lisp-tags-20051125.dot" :vertex-labeler (lambda (vertex stream) (format stream "~(~A~)" (symbol-name (element vertex)))) :edge-formatter (lambda (edge stream) (format stream "weight=~D" (cl-graph:weight edge))))
(click to open full size in it's own window)
That's more like it. The make-filtered-graph method takes a graph, a vertex filter, a completion style and a depth. In this case, we select the single vertex labeled "lisp" and go out to a depth of one. Then we include all of the links.
I've only touched on a few of the methods in CL-Containers and CL-Graph but I hope I've wet your appetites for more.
Copyright -- Gary Warren King, 2004 - 2006