6/20/2013

Drawing DAGs: R Solution


Following up on the previous post, another way to construct DAGs is using R. I think the igraph package is one of the customizable ways to do so. This is a powerful package designed for the visualization and analysis of networks and offers much more functionality than you will use for DAGs.

The R equivalent to the instrumental variables DAG constructed in TeX in the preceding post is the following:
To construct this in R, first you want to create an igraph object (with directed edges) as follows:
g1 <- graph.formula(Z--+T, T--+Y,U--+T, U--+Y, U--+Z, Z--+Y)
If you just view the g1 object (by running g1), you will see the following:
IGRAPH DN-- 4 6 -- 
+ attr: name (v/c)
This denotes that this is a directed named (DN) network with 4 vertices (nodes) and 6 edges. The names of the vertices are stored as an attribute (and correspond to the variable names: Z, T, U, Y). Note that we can access the vertex name attribute as V(g1)$name. Running this will help us identify the order in which the vertices were entered into the igraph object:
> V(g1)$name
[1] "Z" "A" "Y" "U"
To give the vertices custom attributes:
V(g1)$label <- V(g1)$name
V(g1)$color <- c(rep("black",3),"white") 
V(g1)$size <- 7
V(g1)$label.cex <- 1.5
V(g1)$label.dist <- 2
V(g1)$label.color <- "black"
Now the edges (first see the edge sequence by running E(g1)):
E(g1)$color <- "black"
E(g1)$color[2] <- E(g1)$color[4] <- "red"
E(g1)$lty <- c(1,2,1,2,1,1)
E(g1)$label <- c(NA,"X",NA,"X",NA,NA)
E(g1)$label.color <- "red" 
E(g1)$label.cex <- 1.5
If you run plot(g1) now, you will note that the igraph library automatically arranges the vertices. To manually override this arrangement, run the following command:
tkplot(g1)
This opens an interactive graphing screen that allows you to manually move around the vertices and arrange them as you want. Now, while leaving the popup window open, extract the new coordinates for the vertices as:
coord.g1 <- tkplot.getcoords(1) 
(Note that the 1 refers to the number of the interactive window -- this one was the first I opened in this R session).

Now, pass these coordinates to the plotting of the igraph object (along with moving around the label location relative to the vertices):
plot(g1, layout=coord.g1,vertex.label.degree=c(pi,pi/2, 0, -pi/2)

Drawing DAGs: LaTeX Solution

Directed acyclic graphs (DAGs) are commonly used to represent causal relationships between variables across a wide variety of disciplines. For an excellent (and quite accessible) textbook on the topic, please see the book Causal Inference by Miguel Hernan and Jamie Robins.

In this post, I briefly explore how you can draw DAGs in LaTeX. In the subsequent post, I will show how to draw DAGs using R.

The first example of DAG is the common instrumental variable (IV) setup:


We seek to identify the effect of treatment T on outcome Y. However, this is confounded by an unmeasured variable U. The IV is denoted as Z. Technically, we do not need to put in the crossed out red edges between U and Z and Z and Y - absence of edges encodes independence relations. However, I included them to reinforce (and make explicit) the assumptions needed for identification of causal effects using IVs:
  • (Quasi)-exogeneity of the instrument (no path from U to Z)
  • Exclusion restriction (no direct path from Z to Y)
This was made using the TikZ package in LaTeX. I used the \usepackage{pgf,tikz} command at the beginning of my document.

The code to create the DAG above is:
\begin{tikzpicture}
% nodes %
\node[text centered] (z) {$Z$};
\node[right = 1.5 of z, text centered] (t) {$T$};
\node[right=1.5 of t, text centered] (y) {$Y$};
\node[draw, rectangle, dashed, above = 1 of t, text centered] (u) {$U$};

% edges %
\draw[->, line width= 1] (z) --  (t);
\draw [->, line width= 1] (t) -- (y);
\draw[->,red, line width= 1,dashed] (u) --node {X} (z);
\draw[->,line width= 1] (u) --(t);
\draw[->,line width= 1] (u) -- (y);
\draw[->, red, line width=1,dashed] (z) to  [out=270,in=270, looseness=0.5] node{X} (y);
\end{tikzpicture}

Note that I first create the nodes (corresponding to variables in the DAG), and then draw the directed edges between the nodes.

Another example of a DAG is a simple structural equation model where we want each edge to be marked with the parameter signifying the causal effect. For example:

In LaTeX:
\begin{tikzpicture}
% nodes %
\node[text centered] (t) {$T$};
\node[right = 1.5 of t, text centered] (m) {$M$};
\node[right=1.5 of m, text centered] (y) {$Y$};

% edges %
\draw[->, line width= 1] (t) -- node[above,font=\footnotesize]{$\beta$}  (m);
\draw [->, line width= 1] (m) -- node[above,font=\footnotesize]{$\gamma$}  (y);
\draw[->, line width=1] (t) to  [out=270,in=270, looseness=0.5] node[below,font=\footnotesize]{$\delta$} (y);
\end{tikzpicture}