Monday, 28 October 2013

Drawing Trees with D3

So I figure if I gotta write blog posts, I might as well learn something in the process.  I thought that visualizing trees with ASCII was cool, but not very flexible nor aesthetically appealing.  I happen to know some javascript, so why not try using a dynamic visualizing library like D3?  I like the idea of creating an interactive graph that can be shared with people on the internet, and it seems like this is the perfect way to do it.


Cerealizing Serializing

Before I do any of that, I want to set some sort of format for the tree data to take - this way someone could write trees in any language like Python or Java and still render it with the visualizer.  For this purpose we'll use JSON, just because I think it looks nicer than XML :) Although it might be more accurate for the JSON to look like this:

{
 "root": "val1",
 "children": [
  {"root": "val2", "children": {}},
  {"root": "val3", "children": null}
 ]
}

But to work with d3 (and to make it easier to generate the JSON) we'll use this format.

{
 "nodes": [
  {"val": "val1"},
  {"val": "val2"},
  {"val": "val3"}
 ],
 "links": [
  {"src": 0, "nxt": 1},
  {"src": 2, "nxt": 0}
 ]
}

Here each item in the "nodes" list represents nodes that can be extended with other properties, and links uses the indices in "nodes" to create edges. In this case, val1 links to val2 and val2. We'll also say that the "source" is the root of the subtree (it helps formatting). Here's some python code that would make it (I haven't tested it, so it might not work..):

nodes = [], links = []

def flatten_tree(node, parent_index=None):
    """Call on the root of a tree as flatten_tree(root)"""

    nodes.append({"val": node.value})
    node_index = len(nodes) - 1

    if parent_index:
        links.append({"src": parent_index, "nxt": node_index})

    for child in node.children:
        flatten_tree(child, node_index)

Notice that in this format we can have as many children as we like, and even extend the object with extra properties, like ID's.  That way, we have the option to create networks with the same framework, where we're no longer confined to the tree-like descending structure.  Networks are cool.

D3 (the relevant parts)

So D3 is a javascript library that helps you bind data to the DOM and then make it look real pretty using Scalable Vector Graphics (2010 winner of the coolest name in internet technologies).  There are a lot of similar libraries for visualization though, like Protovis, Rafael, and Processing - I chose D3 because of its document-binding, and reputation as the most flexible tool for visualization. Also it was the first one I heard about, which lends it considerable bias :)

I'd like to be concise in my explanation, so if something doesn't make sense, please RTFM.

One important part is being able to search the document and appending to a block, like so (pulled from the doc):

d3.select("body").selectAll("p")
 .data([4, 8, 15])
  .enter().append("p")
 .text(function(d) { return "I’m number " + d + "!"; });

As you can see, we can bind to the "body" block just like jQuery, and then append <p> tags to the body for each element in the data array. It results in:
I'm number 4!
I'm number 8!
I'm number 15!
This is crazy useful! Especially since we can manipulate the data using a function just before appending to the document. Also notice that we can keep calling methods on each tag, so we can style elements and set other properties.

The second thing you need to know are forces - these are a bit more difficult to explain so instead watch this excellent talk.

Now that we have some of the basics, we can pick one of the examples and leech off of pre-existing code (because learning from APIs takes too long). Since I want something minimalist and extensible, I picked out a network, an unlabeled tree , and an example with labeling. After messing around for a while, I just extended the unlabeled tree example with the label values by appending "text" elements to the "g" blocks, and made the radius of the circles larger so they're easier to see.

Yaaayyy

And that's it! The code I wrote is up on my Github: https://github.com/dplyukhin/netviewer. Since the json requests use ajax, there's a super simple node.js server you can run to test it locally (and 'cause I might put it on heroku later).

I hope you learned something! Feel free to ask a question or say that my code is dumb in the comments.

2 comments:

  1. I think it is a good idea to learn through the process of writing blogs. It is interesting that you gave more thoughts on how to visualize graphs more aesthetically appealing? I was not aware of D3, it seems really cool package. I agree with you that XML is not that much user-friendly version of serialization. I usually use YAML but JSON seems very similar to YAML. Thanks for sharing your code and thoughts to everyone, I enjoyed reading this post.

    ReplyDelete
    Replies
    1. Thanks for commenting! Took a look at YAML, and I like how minimal the syntax is, but wouldn't it get difficult to read with deeply-nested objects? I know a lot of databases use JSON internally. What do you use YAML for, specifically?

      Delete