Assignment 5—Solution

Question 1

I save the .csv version of the file to my hard disk and read it into R. I then retain only those observations that have non-missing values for both Brain_Mass and Body_Mass.

mammals <- read.csv('ecol 563/brainbody.csv')

names(mammals)

[1] "Updated_Species_Name"            "Species_Name"
[3] "Species_Order"                   "Brain_Volume"
[5] "Brain_Volume_Standard_Deviation" "Brain_Mass"
[7] "Brain_Mass_Standard_Deviation"   "Body_Mass"
[9] "Body_Mass_Standard_Deviation"    "Sex"
[11] "No_Individuals"                  "Age_Class"
[13] "Source"                          "Notes"

dim(mammals)

[1] 1998 14

mammals2 <- mammals[!is.na(mammals$Brain_Mass) & !is.na(mammals$Body_Mass),]

dim(mammals2)

[1] 1818 14

Question 2

I tabulate the occurrences of the various orders in the data frame.

table(mammals2$Species_Order)

    Afrosoricida        Carnivora Cetartiodactyla Cetartiodactyla        Chiroptera
              17              173              164                1               59
Dasyuromorphia Didelphimorphia    Diprotodontia   Erinaceomorpha     Eulipotyphla
              51               20              124               19               44
      Hyracoidea       Lagomorpha    Macroscelidea   Microbiotheria Notoryctemorphia
               5               25                4                1                1
Paucituberculata Peramelemorphia   Perissodactyla         Primates      Proboscidea
               2               15               12              622               10
        Rodentia       Scandentia          Sirenia    Tubulidentata        Xenarthra
             409                5                7                1               27

There is a typo in Cetartiodactyla; one individual was recorded with a trailing space. We can fix that in a number of ways.

Method 1: Using ifelse

mammals2$Species_Order <- ifelse(mammals2$Species_Order == 'Cetartiodactyla ', 'Cetartiodactyla', as.character(mammals2$Species_Order))

Method 2: Using grep

grep('Cetartiodactyla ', mammals2$Species_Order)

[1] 1727

mammals2$Species_Order[1727] <- 'Cetartiodactyla'

Method 3: Using sub

mammals2$Species_Order <- sub('Cetartiodactyla ', 'Cetartiodactyla', mammals2$Species_Order)

Method 4: Using trim

library(gdata)

mammals2$Species_Order <- trim(mammals2$Species_Order)

table(mammals2$Species_Order)

    Afrosoricida        Carnivora Cetartiodactyla       Chiroptera   Dasyuromorphia
              17              173              165               59               51
Didelphimorphia    Diprotodontia   Erinaceomorpha     Eulipotyphla       Hyracoidea
              20              124               19               44                5
      Lagomorpha    Macroscelidea   Microbiotheria Notoryctemorphia Paucituberculata
              25                4                1                1                2
Peramelemorphia   Perissodactyla         Primates      Proboscidea         Rodentia
              15               12              622               10              409
      Scandentia          Sirenia    Tubulidentata        Xenarthra
               5                7                1               27

Next we identify the four most abundant orders, extract their names, and use them to subset the data frame.

first.four <- sort(table(mammals2$Species_Order), decreasing=T)[1:4]

first.four

Primates Rodentia Carnivora Cetartiodactyla
622 409 173 165

# use the intersection operator to subset the data

mammals3 <- mammals2[mammals2$Species_Order %in% names(first.four), ]

dim(mammals3)

[1] 1369 14

Question 3

To obtain summary statistics by species I use the tapply function. (See lecture 2.) The first argument to tapply is the variable we wish to summarize, the second argument is the grouping variable (Species_Name), and the last argument is the function we wish to apply separately by Species_Name. Following the hint I first remove the "ghost" species levels by redeclaring the Species_Name as a factor in the reduced data set.

# remove ghost species levels

mammals3$Species_Name <- factor(mammals3$Species_Name)

# calculate species means

mean.body <- tapply(mammals3$Body_Mass, mammals3$Species_Name, mean)

mean.brain <- tapply(mammals3$Brain_Mass, mammals3$Species_Name, mean)

To obtain the order for each species we can use tapply again this time using the function given in the hint to this problem. The tapply function selects vectors of Species_Order separately by Species_Name and the function function(x) as.character(x[1]) then takes the first element from this vector.

sp.order <- tapply(mammals3$Species_Order, mammals3$Species_Name, function(x) as.character(x[1]))

A vector of species names can be create the same way. Alternatively we can get them using the names function on any of the three vectors just created.

sp.name <- names(mean.body)

I assemble everything in a data frame.

brainbody <- data.frame(body=mean.body, brain=mean.brain, sp.order=sp.order, species=sp.name)

brainbody[1:12,]

                           body   brain        sp.order                species
Acinonyx jubatus        22200.0   2.449       Carnivora       Acinonyx jubatus
Acomys cahirinus           44.0   0.760        Rodentia       Acomys cahirinus
Acomys wilsonii            18.5   0.580        Rodentia        Acomys wilsonii
Aepyceros melanus       47735.0 162.000 Cetartiodactyla      Aepyceros melanus
Aeromys tephromelas      1189.0 10.340        Rodentia    Aeromys tephromelas
Aethomys chrysophilus     117.0   1.250        Rodentia Aethomys chrysophilus
Aethomys hindei           146.3   1.420        Rodentia        Aethomys hindei
Aethomys namaquensis       79.4   0.890        Rodentia   Aethomys namaquensis
Ailuropoda melanoleuca 37500.0 226.000       Carnivora Ailuropoda melanoleuca
Ailurus fulgens          5590.0 46.800       Carnivora        Ailurus fulgens
Alactaga saliens          102.0   2.100        Rodentia       Alactaga saliens
Alcelaphus buselaphus 134090.0 275.000 Cetartiodactyla Alcelaphus buselaphus

Notice that we didn't need to create the species column because the species names appear as the row names of the data frame. To eliminate the row names I just set them to NULL.

rownames(brainbody) <- NULL

brainbody[1:12,]

       body   brain        sp.order                species
1   22200.0   2.449       Carnivora       Acinonyx jubatus
2      44.0   0.760        Rodentia       Acomys cahirinus
3      18.5   0.580        Rodentia        Acomys wilsonii
4   47735.0 162.000 Cetartiodactyla      Aepyceros melanus
5    1189.0 10.340        Rodentia    Aeromys tephromelas
6     117.0   1.250        Rodentia Aethomys chrysophilus
7     146.3   1.420        Rodentia        Aethomys hindei
8      79.4   0.890        Rodentia   Aethomys namaquensis
9   37500.0 226.000       Carnivora Ailuropoda melanoleuca
10   5590.0 46.800       Carnivora        Ailurus fulgens
11    102.0   2.100        Rodentia       Alactaga saliens
12 134090.0 275.000 Cetartiodactyla Alcelaphus buselaphus

Question 4

As the background to this problem explained, we can fit the allometric equation by carrying out a regression of log(brain mass) on log(body mass). The slope of the corresponding line is the allometric exponent. The way to determine if the same regression line works for each of four different orders is by fitting a single model in which log(body mass) interacts with species order. A test of the interaction term in this model is a test of whether the slopes are the same for the four orders.

brainbody$sp.order <- factor(brainbody$sp.order)

out1 <- lm(log(brain) ~ log(body)*sp.order, data=brainbody)

anova(out1)

Analysis of Variance Table

Response: log(brain)
                    Df Sum Sq Mean Sq    F value    Pr(>F)
log(body)            1 3214.5 3214.5 13099.2084 < 2.2e-16 ***
sp.order             3 158.6    52.9   215.4475 < 2.2e-16 ***
log(body):sp.order   3    5.7     1.9     7.7145 4.51e-05 ***
Residuals          670 164.4     0.2
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Because the interaction term log(body mass) × order is significant (p < .001) we conclude that the allometric exponents differ among the four mammalian orders being examined.

Question 5

With four groups there are six possible pairwise comparisons of the slopes we can carry out. If we examine the summary table of the model we obtain three of those comparisons for free.

round(summary(out1)$coefficients,3)

                                  Estimate Std. Error t value Pr(>|t|)
(Intercept)                         -1.348      0.232 -5.803    0.000
log(body)                            0.603      0.023 26.315    0.000
sp.orderCetartiodactyla              0.172      0.460   0.374    0.709
sp.orderPrimates                    -0.652      0.311 -2.097    0.036
sp.orderRodentia                    -1.198      0.249 -4.815    0.000
log(body):sp.orderCetartiodactyla    0.013      0.042   0.312    0.755
log(body):sp.orderPrimates           0.162      0.035   4.593    0.000
log(body):sp.orderRodentia           0.058      0.029   1.990    0.047

The above table compares Carnivora against the other three orders. To obtain the other three comparisons we can choose a different reference group for order. I run the regression two more times, the first time making Cetartiodactyla the reference group and the second time Primates.

# make Cetartiodactyla the reference group

brainbody$order2 <- factor(brainbody$sp.order, levels=levels(brainbody$sp.order)[c(2:4,1)])

out1a <- lm(log(brain) ~ log(body)*order2, data=brainbody)

round(summary(out1a)$coefficients,3)

                          Estimate Std. Error t value Pr(>|t|)
(Intercept)                 -1.176      0.397 -2.966    0.003
log(body)                    0.616      0.035 17.784    0.000
order2Primates              -0.824      0.447 -1.842    0.066
order2Rodentia              -1.369      0.407 -3.369    0.001
order2Carnivora             -0.172      0.460 -0.374    0.709
log(body):order2Primates     0.149      0.044   3.400    0.001
log(body):order2Rodentia     0.045      0.039   1.146    0.252
log(body):order2Carnivora   -0.013      0.042 -0.312    0.755

# make Primates the reference group

brainbody$order2 <- factor(brainbody$sp.order, levels=levels(brainbody$sp.order)[c(3:4,1:2)])

out1a <- lm(log(brain) ~ log(body)*order2, data=brainbody)

round(summary(out1a)$coefficients,3)

                                Estimate Std. Error t value Pr(>|t|)
(Intercept)                       -2.000      0.207 -9.668    0.000
log(body)                          0.765      0.027 28.603    0.000
order2Rodentia                    -0.545      0.225 -2.421    0.016
order2Carnivora                    0.652      0.311   2.097    0.036
order2Cetartiodactyla              0.824      0.447   1.842    0.066
log(body):order2Rodentia          -0.104      0.032 -3.257    0.001
log(body):order2Carnivora         -0.162      0.035 -4.593    0.000
log(body):order2Cetartiodactyla   -0.149      0.044 -3.400    0.001

Looking at the output we see that the allometric exponent of primates is different from each of the other three orders (p < .001). Among the other comparisons we find rodents and carnivores are significantly different (p = .047), but Cetartiodactyla is not significantly different from rodents or carnivores.

Graphical summary—dynamite plot

I summarize the results with a dynamite plot using the code from lecture 5. I begin by calculating the allometric exponent for each order and its standard error.

# contrast matrix to obtain slopes

cmat <- rbind(c(0,1,0,0,0,0,0,0), c(0,1,0,0,0,1,0,0), c(0,1,0,0,0,0,1,0), c(0,1,0,0,0,0,0,1))

# allometric exponents

my.means.unsort <- as.vector(cmat %*% coef(out1))

names(my.means.unsort) <- levels(brainbody$sp.order)

# standard errors

se.mean <- sqrt(diag(cmat%*%vcov(out1)%*%t(cmat)))

# put things in increasing mean order

my.means <- sort(my.means.unsort)

se.mean.sort <- se.mean[order(my.means.unsort)]

my.means

Carnivora Cetartiodactyla Rodentia Primates
0.6034166 0.6163946 0.6609356 0.7652427

# results of pairwise comparisons

diffs <- c('a','ab','b','c')

# rescale things for the plot

dat1 <- my.means-.5

# create bar plot and save coordinates of bars

bp <- barplot(dat1, axes=F, ylim=c(0,.35), names.arg=names(my.means), cex.names=0.8, ylab='Allometric exponent', xlab='Order',col='lightblue')

# add correct labels on y-axis (because of rescaling)

axis(2, at=seq(0,.3,.1), labels=seq(0,.3,.1)+.5)

# add handles to dynamite plot

arrows(x0=bp, y0=dat1, x1=bp, y1=dat1+se.mean.sort, angle=90, length=0.1)

# add labels to the handles

text(bp, dat1+se.mean.sort, diffs, pos=3, cex=.9)

Fig. 1 Dynamite graph showing the pairwise comparisons of the allometric exponents by order. (No correction for multiple testing was made.)

Graphical summary—confidence intervals

A more elegant way of summarizing things is is by plotting 95% confidence intervals for the allometric exponents by mammalian order along with suitable difference-adjusted confidence intervals. The easiest way to obtain the individual slopes is to fit a reparameterized version of the model in which the individual intercepts and slopes are returned for the four mammalian orders. The details of this were explained in lecture 8. The reparameterization needed for the current model is the following.

out2 <- lm(log(brain) ~ log(body):sp.order + sp.order-1, data=brainbody)

round(summary(out2)$coefficients,3)

                                  Estimate Std. Error t value Pr(>|t|)
sp.orderCarnivora                   -1.348      0.232 -5.803    0.000
sp.orderCetartiodactyla             -1.176      0.397 -2.966    0.003
sp.orderPrimates                    -2.000      0.207 -9.668    0.000
sp.orderRodentia                    -2.546      0.089 -28.586    0.000
log(body):sp.orderCarnivora          0.603      0.023 26.315    0.000
log(body):sp.orderCetartiodactyla    0.616      0.035 17.784    0.000
log(body):sp.orderPrimates           0.765      0.027 28.603    0.000
log(body):sp.orderRodentia           0.661      0.018 37.561    0.000

The slopes appear in rows 5 through 8 of the output. To obtain the 95% confidence intervals we can use the confint function on the model.

confint(out2)

                                       2.5 %     97.5 %
sp.orderCarnivora                 -1.8041545 -0.8919669
sp.orderCetartiodactyla           -1.9551127 -0.3974975
sp.orderPrimates                  -2.4066510 -1.5941133
sp.orderRodentia                  -2.7206716 -2.3709385
log(body):sp.orderCarnivora        0.5583916 0.6484416
log(body):sp.orderCetartiodactyla 0.5483393 0.6844499
log(body):sp.orderPrimates         0.7127109 0.8177745
        0.6263849 0.6954863

To obtain the difference-adjusted confidence levels we need the ci.func function from lecture 5.

# function to calculate difference-adjusted confidence intervals

ci.func <- function(rowvals, lm.model, glm.vmat) {

nor.func1a <- function(alpha, model, sig) 1-pt(-qt(1-alpha/2, model$df.residual) * sum(sqrt(diag(sig))) / sqrt(c(1,-1) %*% sig %*%c (1,-1)), model$df.residual) - pt(qt(1-alpha/2, model$df.residual) * sum(sqrt(diag(sig))) / sqrt(c(1,-1) %*% sig %*% c(1,-1)), model$df.residual, lower.tail=F)

nor.func2 <- function(a,model,sigma) nor.func1a(a,model,sigma)-.95

n <- length(rowvals)

xvec1b <- numeric(n*(n-1)/2)

vmat <- glm.vmat[rowvals,rowvals]

ind <- 1

for(i in 1:(n-1)) {

for(j in (i+1):n){

sig <- vmat[c(i,j), c(i,j)]

#solve for alpha

xvec1b[ind] <- uniroot(function(x) nor.func2(x, lm.model, sig), c(.001,.999))$root

ind <- ind+1

}}

1-xvec1b

}

The variances and covariances of the slopes occupy rows 5 through 8 and columns 5 through 8 of the variance-covariance matrix of the model.

vals1 <- ci.func(1:4, out2, vcov(out2)[5:8,5:8])

vals1

[1] 0.8429977 0.8357554 0.8381188 0.8379638 0.8554106 0.8432272

Although the confidence levels don't vary much the little bit of variability that is shown does make a difference because the p-value for testing that the slopes of carnivores and rodents are equal is very close to 0.05. The confidence level for this comparison is the third element of vals1. I begin with the highest reported confidence level and obtain the corresponding confidence intervals using the confint function.

confint(out2, level=vals1[5])[5:8,]

                                     7.23 %   92.77 %
log(body):sp.orderCarnivora       0.5699234 0.6369098
log(body):sp.orderCetartiodactyla 0.5657697 0.6670196
log(body):sp.orderPrimates        0.7261653 0.8043201
log(body):sp.orderRodentia        0.6352340 0.6866371

With this choice the interval for Primates doesn't overlap any of the others. But notice that the intervals for carnivores and rodents overlap. We know from the summary table of the original model that these two orders are just barely significantly different. Of course we're using the wrong difference-adjusted level for this comparison. I redo the calculations using the correct confidence level for this comparison.

confint(out2, level=vals1[3])[5:8,]

                                     8.09 %   91.91 %
log(body):sp.orderCarnivora       0.5713061 0.6355271
log(body):sp.orderCetartiodactyla 0.5678596 0.6649296
log(body):sp.orderPrimates        0.7277786 0.8027068
log(body):sp.orderRodentia        0.6362950 0.6855761

Now the two intervals don't overlap. Because the conclusions for the rest of the comparisons remain the same, I use this level to calculate the difference-adjusted confidence intervals. I assemble everything in a data frame.

out838 <- confint(out2,level=vals1[3])[5:8,]

out95 <- confint(out2,level=.95)[5:8,]

newdata <- data.frame(var.labels=levels(brainbody$sp.order), ests=coef(out2)[5:8], out95, out838)

names(newdata)[3:6] <- c('low95', 'up95', 'low83', 'up83')

rownames(newdata)<-NULL

newdata

       var.labels      ests     low95      up95     low83      up83
1       Carnivora 0.6034166 0.5583916 0.6484416 0.5713061 0.6355271
2 Cetartiodactyla 0.6163946 0.5483393 0.6844499 0.5678596 0.6649296
3        Primates 0.7652427 0.7127109 0.8177745 0.7277786 0.8027068
4        Rodentia 0.6609356 0.6263849 0.6954863 0.6362950 0.6855761

I modify the code from lecture 4 (for plotting effect estimates from regression models) for use here.

prepanel.ci <- function(x, y, lx, ux, subscripts, ...){

list(xlim=range(x, ux[subscripts], lx[subscripts], finite=T))}

dotplot(var.labels~ests, data=newdata, lx=newdata$low95, ux=newdata$up95, xlab='Allometric exponent', prepanel=prepanel.ci, panel=function(x,y){

panel.dotplot(x, y, col='white', cex=.01)

panel.arrows(newdata$low95, as.numeric(y), newdata$up95, as.numeric(y), col="dodgerblue4", angle=90,length=.05, code=3)

panel.segments(newdata$low83, as.numeric(y), newdata$up83, as.numeric(y), col="dodgerblue1", lwd=5, lineend=1)

panel.xyplot(x, y, pch=16, cex=1, col='white')

panel.xyplot(x, y, pch=1, cex=1.1, col='dodgerblue')

panel.abline(v=seq(0.54,0.84,.02), col='grey70', lty=3)

})

Fig. 2 95% confidence intervals and difference-adjusted (83.8%) confidence intervals for the allometric exponents of four mammalian orders. The difference-adjusted intervals can be used to make pairwise comparisons among the orders.

Question 6

I use tapply and the range function to obtain the ranges of the log body mass for each of the four mammalian orders.

out.r <- tapply(log(brainbody$body), brainbody$sp.order, range)

out.r

$Carnivora
[1] 4.838924 14.511902

$Cetartiodactyla
[1] 7.599401 17.876970

$Primates
[1] 3.988984 11.656124

$Rodentia
[1] 1.856298 9.806426

The output is a list. To extract the minimum value for Primates, e.g., I can write: out.r[[3]][1] because Primates are the third element in the list and the minimum is the first element of the Primate vector. To generate the plot of the regression lines restricting the ranges to the data for the separate orders I use the curve function. The coefficient vector of the model is the following.

coef(out1)

                   (Intercept)                         log(body)
                   -1.34806072                        0.60341658
       sp.orderCetartiodactyla                  sp.orderPrimates
                   0.17175561                       -0.65232139
              sp.orderRodentia log(body):sp.orderCetartiodactyla
                   -1.19774436                        0.01297802
    log(body):sp.orderPrimates        log(body):sp.orderRodentia
                    0.16182611                        0.05751898

The equation of the regression line for carnivores is

coef(out1)[1] + coef(out1)[2] * x.

For the three remaining orders the equation of the regression line takes the following form.

coef(out1)[1]+coef(out1)[y]+(coef(out1)[2]+coef(out1)[y+3])*x

where y is either 3, 4, or 5. Based on this I can write a function that draws the curve for Cetartiodactyla, Primates, and Rodentia.

mycurve <- function(y) curve(coef(out1)[1] + coef(out1)[y] + (coef(out1)[2] + coef(out1)[y+3])*x, from=out.r[[y-1]][1], to=out.r[[y-1]][2], col=mycol[y-1], lty=2, add=T)

plot(log(brain)~log(body), data=brainbody, xlab=expression(log('body mass')), ylab=expression(log('brain mass')), col=c('grey50', 'pink2', 'lightgreen', 'dodgerblue1')[as.numeric(sp.order)], cex=.5)

# colors for lines

mycol <- c(1,2,'seagreen',4)

# draw Carnivora line

curve(coef(out1)[1] + coef(out1)[2]*x, from=out.r[[1]][1], to=out.r[[1]][2], col=mycol[1], lty=2, add=T)

# draw remaining lines: assign output of sapply to an object to suppress printing

junk <- sapply(3:5, mycurve)

legend('topleft', levels(brainbody$order), pch=1, col=c('grey50', 'pink2', 'lightgreen', 'dodgerblue1'), lty=2, bty='n')

identify(log(brainbody$body), log(brainbody$brain), brainbody$species, cex=.8)

Fig. 3 Plot of the allometric relationship between brain mass and body mass on a log-log scale.

Question 7

I used the identify function to add the names of the outlier species to the graph of Question 6. They are Acinonyx jubatus and Capricornis crispus.

hw5 scores

Course Home Page

Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum for Ecology and the Environment, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--October 6, 2012
URL: https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/solutions/assign5.htm