|
| |||
|
opening it up with Common Lisp |
|||
|
Favorite weblogs Lisp Related Politics Other home Recent Readings
Book review: Darwinia
Summer reading: Spin
Runner
the Omnivoire's Delimma
the Golem's Eye |
|
A tiny bit of del.icio.us / cl-graph fun I've been using del.icio.us for a while now and like it. So far, most of my usage has been entirely personal; I use it as a bookmark repository and don't explore other people's tags or posts. The main reason for this is that I already have way too much to read and do and finding stuff to add to my lists doesn't look as if it's going to become a problem any time soon! I do, however, have strong interests in folksonomy, ontology, and networking and I've wanted to try some simple visualizations of my data. This turns out to be a natural enough reason to let me write a brief tutorial of CL-Graph and CL-Containers. I used the del.icio.us API to get an XML file of all of my posts. I then used the XMLS package to parse the XML into this:
Next, I created a class to hold the information about a post (this is probably better done as a structure but I tend to use objects unless I really care about space and time). The defclass* macro is part of Metatilities and mainly exists to save typing all those :initforms and :readers and whatnot. (defclass* delicious-post () ((post-time nil ia :initarg :time) (tags nil ia :initarg :tag) (hash nil ia) (extended nil ia) (description nil ia) (post-href nil ia :initarg :href))) (defun determine-tag-counts (delicious-post-file) "Returns a list of tags and their counts from a delicious-post-file." (bind ((posts (xmls::parse delicious-post-file)) (tags (collect-elements ;; the first two elements of posts aren't tags (cddr posts) :transform (lambda (post-info) (let ((tags (find "tag" (second post-info) :test #'string-equal :key #'first))) (when tags (tokenize-string (second tags) :delimiter #\ ))))))) (element-counts (flatten tags) :test #'equal))) The next bit o' code reads in the XML file and aggregates all the tags. The funny bits are bind, collect-elements, tokenize-string, flatten and element-counts. Collect-elements is sort of like mapcar but it works for any kind of container both Common Lisp ones like lists, vectors and hash-tables and CL-Container ones like red-black-trees, heaps, and stacks. Element-Counts is another CL-Container's method. It returns an associative list of each unique element (where uniqueness depends on the test) and the number of times it appears. If I try determine-tag-counts on my posts, I get: (("techology" 1) ("christmas" 1) ("parallelism" 1) ("mail-and-print" 1) ("alternative-energy" 1) ("review" 2) ("blog-this" 2) ("autism" 2) ("neurology" 1) ("alf" 1) ...) But I probably really want to see the results sorted. If we change the call to element-counts by adding :sort #'> :sort-on :counts, then we'll get: (("to-read" 134) ("book-to-read" 66) ("software-development" 24) ("computer-science" 20) ("software" 19) ("programming" 15) ("science" 9) ("been-read" 9) ("social-software" 9) ("unix" 9) ...) Oh dear. Looks like I'm behind on my reading... Visualizing Tag interconnectionsNeither tags nor posts sit by themselves. If I made a graph with a vertex for each post and another one for each tag and then added a link between each post and its tags, I'd end up with a bipartite graph. To simplify, I could then project the bipartite graph down onto only its tags (or posts) to create a new graph whose vertexes were all tags (or posts) and where two tags (or posts, etc) were linked when they both share a post in the original graph. Here's how it would look in CL-Graph. (defun create-bipartite-tag/post-graph (delicious-post-file) "Creates a bipartite graph of tags, posts and the links between them from a delicious post file." (bind ((posts (parse-delicious-posts delicious-post-file)) (g (cl-graph:make-graph 'cl-graph:graph-container))) (iterate-elements posts (lambda (post) (iterate-elements (tags post) (lambda (tag) (cl-graph:add-edge-between-vertexes g post tag))))) g)) Create-bipartite-tag/post-graph parses the XML as before and then makes a graph to hold on to them. Make-graph is a synonym for make-instance and here we make a graph-container (which is more or less a graph represented by an adjacency list but we don't have to worry about that). Iterate-elements is mapc as collect-elements was to mapcar. We use it to call add-edge-between-vertexes for each post and tag. Now that we have the bipartite graph, we can use project-bipartite-graph to get one the unipartite projection on tags and then call graph->dot to see what it looks like: (cl-graph:graph->dot (cl-graph:project-bipartite-graph (cl-graph:make-graph 'cl-graph:graph-container :default-edge-class 'cl-graph:weighted-edge) full-graph 'keyword (compose 'type-of 'element)) "user-home:temporary;all-tags.dot" :vertex-labeler (lambda (vertex stream) (format stream "~(~A~)" (symbol-name (element vertex)))) :edge-formatter (lambda (edge stream) (format stream "weight=~D" (cl-graph:weight edge)))) The project-bipartite-graph method takes as input graph to be created (either a symbol naming the class or an existing (presumably empty!) graph), the graph to project (full-graph in the example code), a value specifying which vertexes to project and a function that will be applied to each vertex in the original graph. In this case, I use the function built by composing #'element and #'type-of. When called on a vertex, this will return the type of whatever is contained in the vertex. The way I built the graph, this will be a keyword for tags and a delicious-post for posts. Graph->dot has lots of parameters that can be used to control the output. Here, I use the vertex-labeler and the edge-formatter to specify a little bit of fanciness. (Update 2005-11-28) As I learned to my chagrin, the output is huge (5.2 Megabyte). If you'd like to see it, you can click to open full size in another window. That, however, is a bit of a mess (even enlarged). I'd really like to focus in on one or two tags and see what that looks like. I'll use make-filtered-graph to see only the tags that are linked to "lisp": (cl-graph:graph->dot (cl-graph:make-filtered-graph (cl-graph:project-bipartite-graph (cl-graph:make-graph 'cl-graph:graph-container :default-edge-class 'cl-graph:weighted-edge) (create-bipartite-tag/post-graph #P"user-home:temporary;all-posts.xml") 'keyword (compose 'type-of 'element)) (lambda (v) (search "lisp" (symbol-name (element v)) :test #'string-equal)) :complete-closure-with-links 1) "user-home:temporary;lisp-tags-20051125.dot" :vertex-labeler (lambda (vertex stream) (format stream "~(~A~)" (symbol-name (element vertex)))) :edge-formatter (lambda (edge stream) (format stream "weight=~D" (cl-graph:weight edge)))) ![]() (click to open full size in it's own window) That's more like it. The make-filtered-graph method takes a graph, a vertex filter, a completion style and a depth. In this case, we select the single vertex labeled "lisp" and go out to a depth of one. Then we include all of the links. I've only touched on a few of the methods in CL-Containers and CL-Graph but I hope I've wet your appetites for more. | |
|
|
Home | About | Quotes | Recent | Archives Copyright -- Gary Warren King, 2004 - 2006 |