opening it up with Common Lisp

Favorite weblogs

Lisp Related

Bill Clementson

Finding Lisp

Lemonodor

Lispmeister.com

Planet Lisp

Politics

Orcinus

Talking Points Memo

This Modern World

Working for Change

Other home

Polliblog

Recent Readings

Book review: Darwinia
Reviewed: Friday, August 11, 2006

Summer reading: Spin
Reviewed: Saturday, August 5, 2006

Runner
Reviewed: Tuesday, July 18, 2006

the Omnivoire's Delimma
Reviewed: Wednesday, July 12, 2006

the Golem's Eye
Reviewed: Wednesday, May 31, 2006





tinderbox
 width=

A tiny bit of del.icio.us / cl-graph fun
Friday, November 25, 2005

I've been using del.icio.us for a while now and like it. So far, most of my usage has been entirely personal; I use it as a bookmark repository and don't explore other people's tags or posts. The main reason for this is that I already have way too much to read and do and finding stuff to add to my lists doesn't look as if it's going to become a problem any time soon!

I do, however, have strong interests in folksonomy, ontology, and networking and I've wanted to try some simple visualizations of my data. This turns out to be a natural enough reason to let me write a brief tutorial of CL-Graph and CL-Containers. I used the del.icio.us API to get an XML file of all of my posts. I then used the XMLS package to parse the XML into this:

("posts" (("user" "gwking") ("update" "2005-11-21T15:26:00Z"))

("post"

(("time" "2005-11-21T15:25:47Z") ("tag" "yoga health exercise amherst")

("hash" "9aad47baf972813c8202b43a56e95a61")

("description" "Yoga Center Amherst, Massachusetts")

("href" "http://www.yogacenteramherst.com/")))

("post"

(("time" "2005-11-21T13:30:18Z") ("tag" "kids soccer sports")

("hash" "7d2e120f77e7129753b53a9ab74f1763")

("description" "Home - Allsport Soccer Arena - Northampton, Massachusetts")

("href" "http://northamptonsoccer.com/site/")))

...)

Next, I created a class to hold the information about a post (this is probably better done as a structure but I tend to use objects unless I really care about space and time). The defclass* macro is part of Metatilities and mainly exists to save typing all those :initforms and :readers and whatnot.

(defclass* delicious-post ()
  ((post-time nil ia :initarg :time)
   (tags nil ia :initarg :tag)
   (hash nil ia)
   (extended nil ia)
   (description nil ia)
   (post-href nil ia :initarg :href)))
(defun determine-tag-counts (delicious-post-file)
  "Returns a list of tags and their counts from a delicious-post-file."
  (bind ((posts (xmls::parse delicious-post-file))
         (tags (collect-elements 
                ;; the first two elements of posts aren't tags
                (cddr posts)
                :transform
                (lambda (post-info)
                  (let ((tags (find "tag" (second post-info) 
                                    :test #'string-equal
                                    :key #'first)))
                    (when tags 
                      (tokenize-string (second tags) :delimiter #\ )))))))
    (element-counts 
     (flatten tags)
     :test #'equal)))

The next bit o' code reads in the XML file and aggregates all the tags. The funny bits are bind, collect-elements, tokenize-string, flatten and element-counts. Collect-elements is sort of like mapcar but it works for any kind of container both Common Lisp ones like lists, vectors and hash-tables and CL-Container ones like red-black-trees, heaps, and stacks. Element-Counts is another CL-Container's method. It returns an associative list of each unique element (where uniqueness depends on the test) and the number of times it appears.

If I try determine-tag-counts on my posts, I get:

(("techology" 1) ("christmas" 1) ("parallelism" 1) ("mail-and-print" 1) ("alternative-energy" 1) ("review" 2) ("blog-this" 2) ("autism" 2) ("neurology" 1) ("alf" 1) ...)

But I probably really want to see the results sorted. If we change the call to element-counts by adding :sort #'> :sort-on :counts, then we'll get:

(("to-read" 134) ("book-to-read" 66) ("software-development" 24) ("computer-science" 20) ("software" 19) ("programming" 15) ("science" 9) ("been-read" 9) ("social-software" 9) ("unix" 9) ...)

Oh dear. Looks like I'm behind on my reading...

Visualizing Tag interconnections

Neither tags nor posts sit by themselves. If I made a graph with a vertex for each post and another one for each tag and then added a link between each post and its tags, I'd end up with a bipartite graph. To simplify, I could then project the bipartite graph down onto only its tags (or posts) to create a new graph whose vertexes were all tags (or posts) and where two tags (or posts, etc) were linked when they both share a post in the original graph. Here's how it would look in CL-Graph.

(defun create-bipartite-tag/post-graph (delicious-post-file)
  "Creates a bipartite graph of tags, posts and the links between them from 
a delicious post file."
  (bind ((posts (parse-delicious-posts delicious-post-file))
         (g (cl-graph:make-graph 'cl-graph:graph-container)))
    (iterate-elements 
     posts
     (lambda (post)
       (iterate-elements 
        (tags post)
        (lambda (tag)
          (cl-graph:add-edge-between-vertexes g post tag)))))
    g))

Create-bipartite-tag/post-graph parses the XML as before and then makes a graph to hold on to them. Make-graph is a synonym for make-instance and here we make a graph-container (which is more or less a graph represented by an adjacency list but we don't have to worry about that). Iterate-elements is mapc as collect-elements was to mapcar. We use it to call add-edge-between-vertexes for each post and tag.

Now that we have the bipartite graph, we can use project-bipartite-graph to get one the unipartite projection on tags and then call graph->dot to see what it looks like:

(cl-graph:graph->dot
 (cl-graph:project-bipartite-graph 
  (cl-graph:make-graph 'cl-graph:graph-container 
                       :default-edge-class 'cl-graph:weighted-edge)
  full-graph
  'keyword
  (compose 'type-of 'element))
 "user-home:temporary;all-tags.dot"
 :vertex-labeler 
 (lambda (vertex stream)
   (format stream "~(~A~)" (symbol-name (element vertex))))
 :edge-formatter
 (lambda (edge stream)
   (format stream "weight=~D" (cl-graph:weight edge))))

The project-bipartite-graph method takes as input graph to be created (either a symbol naming the class or an existing (presumably empty!) graph), the graph to project (full-graph in the example code), a value specifying which vertexes to project and a function that will be applied to each vertex in the original graph. In this case, I use the function built by composing #'element and #'type-of. When called on a vertex, this will return the type of whatever is contained in the vertex. The way I built the graph, this will be a keyword for tags and a delicious-post for posts.

Graph->dot has lots of parameters that can be used to control the output. Here, I use the vertex-labeler and the edge-formatter to specify a little bit of fanciness.

(Update 2005-11-28) As I learned to my chagrin, the output is huge (5.2 Megabyte). If you'd like to see it, you can click to open full size in another window.

That, however, is a bit of a mess (even enlarged). I'd really like to focus in on one or two tags and see what that looks like. I'll use make-filtered-graph to see only the tags that are linked to "lisp":

(cl-graph:graph->dot
 (cl-graph:make-filtered-graph
  (cl-graph:project-bipartite-graph 
   (cl-graph:make-graph 'cl-graph:graph-container 
                        :default-edge-class 'cl-graph:weighted-edge)
   (create-bipartite-tag/post-graph #P"user-home:temporary;all-posts.xml")
   'keyword
   (compose 'type-of 'element))
  (lambda (v)
    (search "lisp" (symbol-name (element v)) :test #'string-equal))
  :complete-closure-with-links
  1)
 "user-home:temporary;lisp-tags-20051125.dot"
 :vertex-labeler (lambda (vertex stream)
                   (format stream "~(~A~)" (symbol-name (element vertex))))
 :edge-formatter (lambda (edge stream)
                   (format stream "weight=~D" (cl-graph:weight edge))))

(click to open full size in it's own window)

That's more like it. The make-filtered-graph method takes a graph, a vertex filter, a completion style and a depth. In this case, we select the single vertex labeled "lisp" and go out to a depth of one. Then we include all of the links.

I've only touched on a few of the methods in CL-Containers and CL-Graph but I hope I've wet your appetites for more.


|

Home | About | Quotes | Recent | Archives

Copyright -- Gary Warren King, 2004 - 2006