lattice
package, or ggplot2
. In this post, I'll show how to make such graphs in ggplot2
.I'm still going to use the Pippa Norris' dataset "Democracy Crossnational Data" for this example. It has a variety of political, economic, and social variables for 191 countries. Suppose we want to plot the relationship between GDP per capita in 2000 as the independent variable (transformed via natural logarithm) and the Polity score of democracy in 2000 as the dependent variable (0=most authoritarian, 20=most democratic), conditional upon the region (Region8a variable).
The more common way to do so is using the
lattice
package in R:library(lattice) xyplot(Polity3 ~ log(GDP2000) | Region8a, data = norris.data, xlab = "Logarithm of GDP (PPP annual, WB)", ylab = "Polity Score (0=Full Authoritarianism, 20=Full Democracy)", scales="free", pch=19, panel = function(x, y) { panel.xyplot(x,y,pch=19, cex=0.5, col="black") panel.lmline(x,y,col="red", lty=2) })
The result is:
Note that the
scales="free"
option fits xlim
and ylim
for each region independently. panel.xyplot()
plots the points and panel.lmline()
fits the linear regression line for each region.
Now, suppose we wanted to do this in ggplot
. We are going to make use of the facet_wrap()
command to create a trellis plot.The syntax will be:
ggplot(norris.data, aes(log(GDP2000), Polity3)) + geom_point(shape=20, size=3) + facet_wrap(~Region8a, ncol=4, scales="free") + scale_x_continuous("Logarithm of GDP (PPP annual, WB)") + scale_y_continuous("Polity Score (0=Full Authoritarianism, 20=Full Democracy)") + geom_smooth(method="lm", color="red")
The result is:
One could remove the confidence intervals by including
se=FALSE
within the geom_smooth()
function. One could also force the regression lines to extrapolate beyond the data by including fullrange=TRUE
within the geom_smooth()
function. Finally, to change the color of the confidence interval from the default gray, we can specify including fill="blue"
, for example, within the geom_smooth()
function. Note that in general, the geom_smooth()
function is quite versatile and allows a variety of smoothing methods: you can specify lm
, glm
, gam
, loess
, or rlm
(not exhaustive list) for the method
argument.
An alternative, more flexible, way to do this is to first create a dataframe using the
plyr
package that contains the intercepts and slopes for the linear regression for each region. So instead of automatically passing down the aesthetics on to the geom_smooth()
function as in the previous example, we can manually specify the aesthetics of the lines we want to draw using the geom_abline()
function. library(plyr) reg.df <- ddply(norris.data, .(Region8a), function(i) lm(Polity3 ~ log(GDP2000), data = i)$coefficients[1:2]) names(reg.df)[2] <- "intercept" names(reg.df)[3] <- "logGDP2000"
Then we make the trellis plot as:
ggplot(norris.data, aes(log(GDP2000), Polity3)) + geom_point(shape=20, size=3) + facet_wrap(~Region8a, ncol=4, scales="free") + scale_x_continuous("Logarithm of GDP (PPP annual, WB)") + scale_y_continuous("Polity Score (0=Full Authoritarianism, 20=Full Democracy)") + geom_abline(aes(intercept = intercept, slope = logGDP2000), color = "red", data = reg.df)
A slightly different example of how to create trellis plots comes from the time series version of the Pippa Norris data. Suppose we want to look at the relationship between economic development and democracy over time just within easy country Central and Eastern Europe. One way to do so (as an exploratory tool at least) would be as follows:
CEE.lattice <- ggplot(norris.ts.CEE.data[norris.ts.CEE.data$Year>=1980,], aes(logGDP_WB, Polity1, color=Year)) + geom_path(color="black") + geom_point(shape=20, size=3) + facet_wrap(~Nation, ncol=5, scales="free") + scale_colour_gradient(breaks=c(1980,1990,1995,2000), low="yellow", high="blue") + scale_x_continuous("Logarithm of GDP (PPP annual, WB)") + scale_y_continuous("Polity Score (0=Full Authoritarianism, 20=Full Democracy)") + opts(title="Relationship between Econ. Development and Democracy in Central and Eastern Europe")
The result is:
Note that
scale_colour_gradient(breaks=c(1980,1990,1995,2000), low="yellow", high="blue")
colors the points with a gradient from yellow to blue based on the year. I manually set the breaks in the gradient. Also, note that I use geom_path()
to connect the points using the order of the observations. This is different than the geom_line()
command, which would connect the points in order of the x-variable, which is GDP in this case (and not time).
Finally, and this holds across all of
ggplot
, if we wanted to make the background layer of the plots white instead of gray, we could simply pass the following argument to ggplot
as a layer:CEE.lattice + theme_bw()
The result is: