Assignment 1— Solution

Question 1

I read in the data, convert Density to a factor, and fit a two-factor analysis of variance model with Eggs as the response and Season, Density, and their interaction as predictors.

quinn <- read.csv( 'ecol 563/quinn1.csv')
quinn$Density <- factor(quinn$Density)
out1 <- lm(Eggs ~ Density*Season, data=quinn)
anova(out1)
Analysis of Variance Table

Response: Eggs
               Df  Sum Sq Mean Sq F value    Pr(>F)   
Density         2  4.0019  2.0010  13.984 0.0007325 ***
Season          1 17.1483 17.1483 119.845 1.336e-07 ***
Density:Season  2  1.6907  0.8454   5.908
0.0163632
Residuals      12  1.7170  0.1431                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Based on the output we see that there is a significant interaction between Density and Season. This means that the effect of Density on egg count varies with Season (and vice versa). We can examine the summary table to try to understand the nature of this interaction.

summary(out1)
Call:
lm(formula = Eggs ~ Density * Season, data = quinn)

Residuals:
     Min       1Q   Median       3Q      Max
-0.61133 -0.16167 -0.04217  0.09892  0.55567

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)   
(Intercept)             1.1113333  0.2183934   5.089 0.000267 ***
Density12              -0.0003333  0.3088549  -0.001 0.999157   
Density24              -0.4166667  0.3088549  -1.349 0.202221   
Seasonsummer            2.7753333  0.3088549   8.986 1.12e-06 ***
Density12:Seasonsummer -0.9996667  0.4367868  -2.289 0.041029 * 
Density24:Seasonsummer -1.4700000  0.4367868  -3.365 0.005617 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3783 on 12 degrees of freedom
Multiple R-squared: 0.9301,            Adjusted R-squared: 0.9009
F-statistic: 31.93 on 5 and 12 DF,  p-value: 1.561e-06

From the output it appears that there is no Density effect in spring (the Density12 and Density24 coefficients are not significantly different from zero) but there is a Density effect in summer (the two interaction terms are significant). We'll explore this issue further in Question 3.

Some people examined the issue of variance homogeneity. For that we need to create a variable that identifies the individual treatments. This can be accomplished with the paste function that concatenates the character values of two variables. I then plot the Egg counts against the new treatment variable in a dot plot.

quinn$trt <- factor(paste(quinn$Season, quinn$Density, sep='.'))
plot(as.numeric(trt)~Eggs, data=quinn, ylim=c(.5,6.5), type='n', axes=F, ylab='')
axis(1)
axis(2, at=1:6, rep(c(6,12,24),2))
box()
mtext('Spring', at=2, side=2, line=2.5, cex=1.1)
mtext('Summer', at=5, side=2, line=2.5, cex=1.1)
abline(h=1:6, lty=3, col='grey70')
points(quinn$Eggs, jitter(as.numeric(quinn$trt), a=.15), pch=16, col='dodgerblue1')

fig. 1

Fig. 1  Dot plot of the response variable separately by treatment

While there are some differences in variability across treatment we only have three observations per treatment, not enough to obtain a reliable estimate of variability. We can test for variance heterogeneity but we won't have much power. Given how unreliable our variance estimates are I just report the results from the robust tests, Fligner's and Levene's tests, below. Neither test statistic is close to being statistically significant.

leveneTest(quinn$Eggs~quinn$trt)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  5  0.8504 0.5403
      12              
fligner.test(quinn$Eggs~quinn$trt)
            Fligner-Killeen test of homogeneity of variances

data:  quinn$Eggs by quinn$trt
Fligner-Killeen:med chi-squared = 4.1871, df = 5, p-value = 0.5228

Question 2

To display the two-factor interaction we have a choice. We can plot Density on the x-axis and show separate profiles for each Season, or we can plot Season on the x-axis and shows separate profiles for each Density. Generally it's useful to do both, but because Density is an ordinal variable it makes more sense to plot Density on the x-axis so we take advantage of the natural order of the three Density categories.

library(effects)
plot(effect('Density:Season', out1), multiline=T)

fig. 2

Fig. 2  Plot of the significant two-factor interaction between Season and Density

The interaction apparently derives from the fact that there is a much stronger negative effect of Density on egg count in the summer than there is in the spring.

Question 3

Because Density has three levels it was entered into the analysis of variance model as two dummy variables. Season with two levels was entered as a single variable. The default definitions used by R are the following.

x1x2z

The regression model corresponding to the two-factor interaction model is the following.

regression eqn

The coefficient summary table for the model generated by R is shown below.

round(summary(out1)$coefficients,4)
                       Estimate Std. Error t value Pr(>|t|)
(Intercept)              1.1113     0.2184  5.0887   0.0003
Density12               -0.0003     0.3089 -0.0011   0.9992
Density24               -0.4167     0.3089 -1.3491   0.2022
Seasonsummer             2.7753     0.3089  8.9859   0.0000
Density12:Seasonsummer  -0.9997     0.4368 -2.2887   0.0410
Density24:Seasonsummer  -1.4700     0.4368 -3.3655   0.0056

Using the regression equation and the estimates reported in the summary table we can draw the following conclusions.

  1. When the Density = 6 and Season = 'Spring', x1 = x2 = z = 0. So, our regression model is μ = β0. Thus the intercept, 1.11, corresponds to the mean egg count in Spring when the Density is 6.
  2. When Season = 'Spring', z = 0, and the regression equation is the following: Spring model. This yields the following estimates of the mean egg count at the three different densities
Table 1   Regression model when Season = 'Spring'
Season Density x1 x2 Spring model
Spring 6 0 0 muA
Spring 12 1 0 muB
Spring 24 0 1 regdummy3

From the summary table β1 = –0.0003 and β2 = –0.4167. Both of these effects are negative so mean egg count decreases as Density increases. On the other hand the summary table reports that neither of these effects are statistically significant. Thus the mean egg count in Spring when density is 12 is not significantly different from what it was when Density is 6. Similarly the mean egg count when density is 24 is not significantly different from what it was when Density = 6.

  1. When Density = 6 and Season = 'Spring', x1 = x2 = 0 and z = 1. Thus the regression model is density 6 model. We have the following estimates of the mean egg count in Spring and Summer when Density = 6.
Table 2   Regression model when Density = 6
Season Density z density 6 model
Spring 6 0 muA
Summer 6 1 summer

The parameter β3 represents the difference in mean egg counts, Summer – Spring. From the summary table we see that β3 = 2.7753 and it is significantly different from zero. Thus we conclude that the mean egg count is significantly higher in Summer than in Spring when Density = 6.

  1. The interaction terms are a bit more involved to explain. Table 3 compares the coefficients from an additive model (a model without the interaction terms) with those from the interaction model when Season = 'Summer'.
Table 3   Regression model when Season = 'Summer'
Season Density x1 x2 z Additive model Interaction model
Summer 6 0 0 1 summer summer
Summer 12 1 0 1
Summer 24 0 1 1

Fig. 3 graphically contrasts the predictions of the additive model (one with only the main effects of Season and Density) to the interaction model. Because the reported estimates of the parameters β4 and β5 are negative, the interaction model summer profile lies below the additive model summer profile.

fig. 3

Fig. 3  Contrasting the additive and interaction models for Season and Density

In the additive model the mean profiles for Spring and Summer are parallel. The regression coefficients β4 and β5 in the interaction model are the deviations from additivity (and hence parallelism). The fact that these effects are estimated to be negative (–1.00 and –1.47) and statistically significant indicates that the Season effect is significantly reduced when Density is 12 or 24 than what it is when Density is 6. When density = 6 there is a very large season effect of β3 = 2.775. When density = 12 the season effect is not as large (2.7753 – 0.9997 = 1.776). When density = 24 the season effect is even smaller (2.7753 – 1.47 = 1.31). These effects of density on the season effect are statistically significant.

So we have two interpretations of the interaction. One interpretation is that Density has a marked negative effect on mean egg count in the Summer, but not in Spring. The other interpretation is that the increase in egg count from spring to summer that we see when density is 6 gets significantly reduced when density = 12 and even more so when density = 24.

Question 4

I obtain 1000 simulations from the two-factor interaction model.

set.seed(20)
junk <- simulate(out1, n=1000)

Following the hint I take the minimum of the simulations.

min(junk)
[1] -0.4771207

The simulations returned a negative value. Given that the response variable is egg count, a positive whole number, a negative value is an unreasonable prediction. The simulations produced by the simulate function are organized as columns. I count up the number of simulations that produced one or more negative values. I use the apply function to sum up the number of negative values in each column. I then tabulate the results.

table(apply(junk, 2, function(x) sum(x<0)))
  0   1   2
890 107   3

So 107 simulations yielded a single negative value (out of 18) while three yielded two negative values. Thus 11% of the simulations returned one or more silly values. We can count up the number of negative values and divide by the total number of simulated observations to obtain the overall percentage of silly values.

sum(junk<0)/(nrow(junk)*ncol(junk))
[1] 0.006277778

So, about 0.6% of the simulated values are negative, not a large number, but still indicating that perhaps the model is formulated incorrectly. We'll investigate this further later in the course after we have added more statistical tools to our arsenal.

R code used to produce Fig. 3

out1a <- lm(Eggs ~ Density:Season-1, data=quinn)
plot(c(.5,3.4),range(coef(out1a))+c(-.1,.1), type='n',axes=F, xlab='Density', ylab='Egg count')
axis(2)
axis(1, at=1:3, labels=c(6,12,24))
box()
points(1:3, coef(out1a)[1:3], pch=16, col='dodgerblue1')
lines(1:3, coef(out1a)[1:3], col='dodgerblue1')
lines(1:3, coef(out1a)[4:6], col='pink')
points(1:3, coef(out1a)[4:6], col='red', pch=16)
points(1:3, coef(out1a)[1:3]+coef(out1)[4], col=2)
lines(1:3, coef(out1a)[1:3]+coef(out1)[4], col=2, lty=2)
arrows(2, coef(out1a)[5]+.1, 2, coef(out1a)[2]+coef(out1)[4]-.1, code=3, angle=45, length=.08)
arrows(3, coef(out1a)[6]+.1, 3, coef(out1a)[3]+coef(out1)[4]-.1, code=3, angle=45, length=.08)
text(2, coef(out1a)[5]-coef(out1)[5]/2, pos=4, expression(beta[4]))
text(3, coef(out1a)[6]-coef(out1)[6]/2, pos=4, expression(beta[5]))
text(1, coef(out1)[1]+coef(out1)[4]/2, pos=2, expression(beta[3]))
arrows(1, coef(out1)[1]+.1, 1, coef(out1)[1]+coef(out1)[4]-.1, code=3, angle=45, length=.08)
text(1, coef(out1a)[4], 'Summer', pos=2, col='red')
text(1, coef(out1a)[1], 'Spring', pos=2, col='dodgerblue1')
text(3, coef(out1a)[3]+coef(out1)[4], 'additive\n model', pos=4, cex=.8, col='seagreen4')
text(3, coef(out1a)[6], 'interaction\n model', pos=4, cex=.8, col='seagreen4')

hw1 scores

Course Home Page

Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum for Ecology and the Environment, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--September 9, 2012
URL: https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/solutions/assign1.htm