Assignment 4—Solution

isles <- read.table('islands.txt', header=TRUE, sep=',')

Fitting the models

#Gleason Model

gleason <- lm(sp.richness~log(island.area), data=isles)
coef(gleason)
(Intercept) log(island.area)
  -171.0272         109.4431

The estimated model is μ = –171.03 + 109.44 × log(area), where μ is the mean of species richness.

#log-Arrhenius Model

log.arrhen <- lm(log(sp.richness)~log(island.area), data=isles)
coef(log.arrhen)
(Intercept) log(island.area)
  3.8389781       0.3283347

The estimated model is μ = 3.84 + 0.33 × log(area), where μ is the mean of log(species richness). Alternatively we can interpret exp(μ) as either the median or the geometric mean of species richness (see lecture 7).

#Arrhenius model

I exponentiate the estimates obtained from the log-Arrhenius model for use as initial estimates to the nls function for the Arrhenius model.

arrhen <- nls(sp.richness~b0*island.area^b1, data=isles, start=list(b0=exp(coef(log.arrhen)[1]), b1=coef(log.arrhen)[2]))
arrhen
Nonlinear regression model
model: sp.richness ~ b0 * island.area^b1
data: isles
        b0        b1
83.4789648 0.2549076
residual sum-of-squares: 451062.6

The estimated model is μ = 83.48 × area0.26, where μ is the mean of species richness.

#Poisson model

poisson.glim <- glm(sp.richness~log(island.area), data=isles, family=poisson)
coef(poisson.glim)
(Intercept) log(island.area)
  4.1925024        0.2835591

The estimated model is log μ = 4.19 + 0.28 × log(area), where μ is the mean of species richness.

#Negative binomial model

library(MASS)
negbin.glim <- glm.nb(sp.richness~log(island.area), data=isles)
coef(negbin.glim)
(Intercept) log(island.area)
  3.9729777        0.3172363

The estimated model is log μ = 3.97 + 0.32 × log(area), where μ is the mean of species richness.

Summarizing we have the following results. μ denotes mean richness except in the log-Arrhenius model where it denotes mean log richness.

Model Fitted equation
Gleason μ = –171.03 + 109.44 × log(area)
log-Arrhenius μ(log richness) = 3.84 + 0.33 × log(area)
Arrhenius μ = 83.48 × area0.26
Poisson log μ = 4.19 + 0.28 × log(area)
negative binomial log μ = 3.97 + 0.32 × log(area)

Graphing the Models

The log-Arrhenius model, because it models the log of species richness, stands alone and I plot it by itself. The Gleason model and Arrhenius model are also graphically incompatible. But the Poisson model and negative binomial model are of the same form. They can be plotted together with the Gleason model, or together with the Arrhenius model, as I show below.

#Log-Arrhenius model

plot(log(isles$island.area), log(isles$sp.richness), xlab='log(Area)', ylab='log(Species)')
abline(log.arrhen, col=2, lty=2)
mtext('log-Arrhenius model', side=3, line=.5)

#Arrhenius model

arrhenius.func<-function(x) coef(arrhen)[1]*x^(coef(arrhen)[2])
plot(isles$island.area, isles$sp.richness, xlab='Area', ylab='Species')
curve(arrhenius.func, lty=2, col=2, add=T)
mtext('Arrhenius model', side=3, line=.5)

(a) fig1a (b) fig1b
Fig. 1  Mean log(richness) and mean richness predicted by the log-Arrhenius and Arrhenius models, respectively.

#Gleason, Poisson, and negative binomial models

plot(log(isles$island.area), isles$sp.richness, xlab='log(Area)', ylab='Species')
abline(gleason, col=2, lty=2, lwd=2)
generic.func <- function(model,x) coef(model)[1] + coef(model)[2]*x
curve(exp(generic.func(poisson.glim,x)), lty=3, col=3, lwd=2, add=T)
curve(exp(generic.func(negbin.glim,x)), lty=6, col=4, lwd=2, add=T)
legend('topleft', c('Gleason', 'Poisson', 'negative binomial'), col=c(2,3,4), lty=c(2,3,6), bty='n', lwd=c(3,3,3))

fig 2

Fig. 2   Mean richness as a function of log area as predicted by the Gleason, Poisson, and negative binomial models

Other arrangements of the graphs

As explained above the Arrhenius, negative binomial, and Poisson models can be grouped together.

#Fig. 3
arrhenius.func<-function(x) coef(arrhen)[1]*x^(coef(arrhen)[2])
plot(isles$island.area, isles$sp.richness, xlab='Area', ylab='Species')
curve(arrhenius.func, lty=2, col=2, add=T)
alt.func <- function(model,x) exp(coef(model)[1])*x^coef(model)[2]
curve(alt.func(poisson.glim,x), lty=3, col=3, lwd=2, add=T)
curve(alt.func(negbin.glim,x), lty=6, col=4, lwd=2, add=T)
legend('bottomright', c('Arrhenius', 'Poisson', 'negative binomial'), col=c(2,3,4), lty=c(2,3,6), bty='n', lwd=c(2,2,2), cex=.9)

fig 3
Fig. 3  Mean richness as a function of area as predicted by the Arrhenius, Poisson, and negative binomial models

The log-Arrhenius model can be added to the negative binomial, Poisson, and Gleason models either as the median or the mean, which are not the same here. Exponentiating the lognormal model does not return the mean, but instead returns the median on the raw scale. The lognormal distribution is typically positively skewed so that the median and mean are very different. The mean of the lognormal distribution can be expressed in terms of μ and σ2, the mean and variance of the log-transformed variables.

lognormal mean

#Fig. 4
plot(log(isles$island.area), isles$sp.richness, xlab='log(Area)', ylab='Species')
abline(gleason, col=2, lty=2, lwd=2)
generic.func <- function(model,x) coef(model)[1] + coef(model)[2]*x
curve(exp(generic.func(poisson.glim,x)), lty=3, col=3, lwd=2, add=T)
curve(generic.func2(log.arrhen,x), lty=2, lwd=2, col='grey70', add=T)
curve(exp(generic.func(negbin.glim,x)), lty=6, col=4, lwd=2, add=T)
curve(exp(generic.func(log.arrhen,x)), lty=1, col='grey70', lwd=2, add=T)
legend('topleft', c('Gleason', 'Poisson', 'negative binomial', 'lognormal (median)', 'lognormal (mean)'), col=c(2,3,4,'grey70','grey70'), lty=c(2,3,6,1,2), bty='n', lwd=c(2,2,2), cex=.9)

fig 4
Fig. 4  Modeled richness as a function of log(area) as predicted by the Gleason, Poisson, negative binomial, and lognormal models

Comparing the Models

The log-likelihood for the log-Arrhenius model needs to be calculated separately since the response is log-transformed in this model. We have two versions of this function from lecture 11 which here yield barely different results. There is no additive constant in our model so I set k = 0 when using the functions.

#log-likelihood for a log transformed response and discrete data
norm.loglike4 <- function(model, k, y) {
#MLE of sigma^2
sigma2 <- (sum(residuals(model)^2))/length(y)
# probability at y = 0 is calculated differently
prob <- ifelse(y==0, pnorm(log(y+k+.5), mean=predict(model), sd=sqrt(sigma2)), pnorm(log(y+k+.5), mean=predict(model), sd=sqrt(sigma2)) - pnorm(log(y+k-.5), mean=predict(model), sd=sqrt(sigma2)))
#calculate log-likelihood
loglike <- sum(log(prob))
loglike
}

#log-likelihood for a log transformed response using density
norm.loglike5 <- function(model, k, y) {
#MLE of sigma^2
sigma2 <- (sum(residuals(model)^2))/length(y)
prob <- dnorm(log(y+k), mean=predict(model), sd=sqrt(sigma2)) * 1/(y+k)
#calculate log-likelihood
loglike <- sum(log(prob))
loglike
}
norm.loglike5(log.arrhen, 0, isles$sp.richness)
[1] -133.1472
norm.loglike4(log.arrhen, 0, isles$sp.richness)
[1] -133.147

I create a list of model names, string together the log-likelihoods, list the number of estimated parameters in each model, and calculate the AIC values. Three parameters are estimated in each model (two regression parameters and an auxiliary parameter—the variance for normal models and the dispersion for the negative binomial), except the Poisson model in which only two parameters were estimated.

model.names <- c('Gleason', 'Arrhenius', 'Log-Arrhenius', 'Poisson', 'Neg Binom')
loglike <- c(logLik(gleason), logLik(arrhen), norm.loglike4(log.arrhen, 0, isles$sp.richness), logLik(poisson.glim), logLik(negbin.glim))
numparms <- c(3,3,3,2,3)
AIC.vals <- -2*loglike + 2*numparms
out.models <- data.frame(model=model.names, LL=loglike, k=numparms, AIC=AIC.vals)
out.models
          model        LL k       AIC
1       Gleason -141.4406 3  288.8813
2     Arrhenius -140.4282 3  286.8563
3 Log-Arrhenius -133.1470 3  272.2941
4       Poisson -542.1834 2 1088.3668
5     Neg Binom -133.6894 3  273.3788

From the reported AIC values only the log-Arrhenius and negative binomial models are reasonable models for these data.

It's worth reflecting on the fact that in Fig. 2, the graphs of the worst model (the Poisson) and one of the best models (negative binomial) are virtually identical. In a least squares sense these two models would be found to be nearly indistinguishable, but of course we didn't use least squares to fit these models. We used maximum likelihood. In maximum likelihood we're not evaluating fit by how close the mean trajectory comes to the observations. Instead we're trying to maximize the probability of obtaining the observed data. Thus when we compare models using log-likelihood or AIC we're evaluating probability generating mechanisms. From the log-likelihood (or the AIC) reported in the above table we would conclude that it is very unlikely that a Poisson model generated these data. It is far more likely for a negative binomial model to have generated these data.


hw 4 scores

Course Home Page

Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum for the Environment and Ecology, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--February 16, 2012
URL: https://sakai.unc.edu/access/content/group/2842013b-58f5-4453-aa8d-3e01bacbfc3d/public/Ecol562_Spring2012/docs/solutions/assign4.htm