## 2/25/2012

### Trellis graphs in ggplot2

Trellis graphs are an informative way of visualizing relationships between variables conditional on other variable(s). In R, one can make trellis plots either through the `lattice` package, or `ggplot2`. In this post, I'll show how to make such graphs in `ggplot2`.

I'm still going to use the Pippa Norris' dataset "Democracy Crossnational Data" for this example. It has a variety of political, economic, and social variables for 191 countries. Suppose we want to plot the relationship between GDP per capita in 2000 as the independent variable (transformed via natural logarithm) and the Polity score of democracy in 2000 as the dependent variable (0=most authoritarian, 20=most democratic), conditional upon the region (Region8a variable).

The more common way to do so is using the `lattice` package in R:

```library(lattice)
xyplot(Polity3 ~ log(GDP2000) | Region8a, data = norris.data,
xlab = "Logarithm of GDP (PPP annual, WB)",
ylab = "Polity Score (0=Full Authoritarianism, 20=Full Democracy)",
scales="free", pch=19, panel = function(x, y) {
panel.xyplot(x,y,pch=19, cex=0.5, col="black")
panel.lmline(x,y,col="red", lty=2)
})
```

The result is:

Note that the `scales="free"` option fits `xlim` and `ylim` for each region independently.  `panel.xyplot()` plots the points and  `panel.lmline()` fits the linear regression line for each region. Now, suppose we wanted to do this in `ggplot`. We are going to make use of the ` facet_wrap() ` command to create a trellis plot.

The syntax will be:
```ggplot(norris.data, aes(log(GDP2000), Polity3)) +
geom_point(shape=20, size=3) +
facet_wrap(~Region8a, ncol=4, scales="free") +
scale_x_continuous("Logarithm of GDP (PPP annual, WB)") +
scale_y_continuous("Polity Score (0=Full Authoritarianism, 20=Full Democracy)") +
geom_smooth(method="lm", color="red")
```

The result is:

One could remove the confidence intervals by including `se=FALSE` within the `geom_smooth()` function. One could also force the regression lines to extrapolate beyond the data by including `fullrange=TRUE` within the `geom_smooth()` function. Finally, to change the color of the confidence interval from the default gray, we can specify including `fill="blue"`, for example, within the `geom_smooth()` function. Note that in general, the `geom_smooth()` function is quite versatile and allows a variety of smoothing methods: you can specify `lm`, `glm`, `gam`, `loess`, or `rlm` (not exhaustive list) for the `method` argument.

An alternative, more flexible, way to do this is to first create a dataframe using the `plyr` package that contains the intercepts and slopes for the linear regression for each region. So instead of automatically passing down the aesthetics on to the `geom_smooth()` function as in the previous example, we can manually specify the aesthetics of the lines we want to draw using the `geom_abline()` function.

```library(plyr)
reg.df <- ddply(norris.data, .(Region8a), function(i)
lm(Polity3 ~ log(GDP2000), data = i)\$coefficients[1:2])
names(reg.df)[2] <- "intercept"
names(reg.df)[3] <- "logGDP2000"
```

Then we make the trellis plot as:
```ggplot(norris.data, aes(log(GDP2000), Polity3)) +
geom_point(shape=20, size=3) +
facet_wrap(~Region8a, ncol=4, scales="free") +
scale_x_continuous("Logarithm of GDP (PPP annual, WB)") +
scale_y_continuous("Polity Score (0=Full Authoritarianism, 20=Full Democracy)") +
geom_abline(aes(intercept = intercept, slope = logGDP2000), color = "red", data = reg.df)
```

A slightly different example of how to create trellis plots comes from the time series version of the Pippa Norris data. Suppose we want to look at the relationship between economic development and democracy over time just within easy country Central and Eastern Europe. One way to do so (as an exploratory tool at least) would be as follows:

```CEE.lattice <- ggplot(norris.ts.CEE.data[norris.ts.CEE.data\$Year>=1980,], aes(logGDP_WB, Polity1, color=Year)) +
geom_path(color="black") +
geom_point(shape=20, size=3) +
facet_wrap(~Nation, ncol=5, scales="free") +
scale_x_continuous("Logarithm of GDP (PPP annual, WB)") +
scale_y_continuous("Polity Score (0=Full Authoritarianism, 20=Full Democracy)") +
opts(title="Relationship between Econ. Development and Democracy in Central and Eastern Europe")
```

The result is:

Note that ` scale_colour_gradient(breaks=c(1980,1990,1995,2000), low="yellow", high="blue") ` colors the points with a gradient from yellow to blue based on the year. I manually set the breaks in the gradient. Also, note that I use ` geom_path() ` to connect the points using the order of the observations. This is different than the ` geom_line() ` command, which would connect the points in order of the x-variable, which is GDP in this case (and not time).

Finally, and this holds across all of `ggplot`, if we wanted to make the background layer of the plots white instead of gray, we could simply pass the following argument to `ggplot` as a layer:

```CEE.lattice + theme_bw()
```

The result is:

#### 1 comment:

1. Very useful post. Thanks