Assignment 5—Solution

insec <- read.table( 'http://www.bio.ic.ac.uk/research/mjcraw/statcomp/data/logistic.txt', header=T)
names(insec)
[1] "logdose" "n"       "dead"    "product"

Question 1

These are binomial data where we are given the number of successes (dead) out of a total (n). To fit this model in R the response variable needs to be a matrix in which the first column is the number of successes and the second column is the number of failures. Hence we use cbind(dead, n-dead) as the response. To address the question, "Does the dose-response relationship vary by product?" we need to fit a model in which logdose and product interact.

out1 <- glm(cbind(dead, n-dead)~logdose*product, data=insec, family=binomial)
anova(out1, test='Chisq')
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(dead, n - dead)

Terms added sequentially (first to last)

                Df Deviance Resid. Df Resid. Dev  Pr(>Chi)   
NULL                               50    311.241             
logdose          1  238.854        49     72.387 < 2.2e-16 ***
product          2    0.206        47     72.181    0.9022   
logdose:product  2   29.876        45     42.305
3.255e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The analysis of deviance table reveals a statistically significant interaction between log dose and product. So we conclude that the dose-response relationship varies by product.

Question 2

To determine if the residual deviance is suitable here as a goodness of fit statistic we need to examine the expected counts. If no more than 20% of the expected counts are less than 5, the chi-squared distribution of the deviance is reasonable. I use the fitted function to obtain the estimated probabilities and use them to calculate the expected number of successes and failures.

pred.counts <- cbind(fitted(out1)*insec$n, insec$n - fitted(out1) * insec$n)
sum(pred.counts<5)/length(as.vector(pred.counts))
[1] 0.245098

So 24.5% of the counts are less than 5. The use of the residual deviance as as a goodness of fit statistic is suspect here.

Question 3

To determine if the model can be simplified I examine the summary table for the model.

summary(out1)
Call:
glm(formula = cbind(dead, n - dead) ~ logdose * product, family = binomial,
    data = insec)

Deviance Residuals:
     Min        1Q    Median        3Q       Max 
-2.39800  -0.60734   0.09695   0.65920   1.69300 

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)   
(Intercept)       -2.0410     0.3034  -6.728 1.72e-11 ***
logdose            2.8768     0.3432   8.383  < 2e-16 ***
productB           1.2920     0.3845   3.360  0.00078 ***
productC          -0.3079     0.4308  -0.715  0.47469   
logdose:productB  -1.6457     0.4220  -3.899 9.65e-05 ***
logdose:productC   0.4318     0.4898   0.882 
0.37793   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 311.241  on 50  degrees of freedom
Residual deviance:  42.305  on 45  degrees of freedom
AIC: 201.31

Number of Fisher Scoring iterations: 4

From the output we see that the dose response relationship for product C is not different than it is for product A , the reference group (p = .038). On the other hand the dose-response relationship of products A and B are clearly different (p < .001). This suggests we might collapse products A and C together and refit the model.

insec$newprod <- factor(ifelse(insec$product=='B', 'B', 'AC'))

There are two new models we can fit. We can force products A and C to have the same slopes (on a logit scale) but different intercepts. If this model holds, the dose effect is the same but one of the products consistently kills more bugs than the other. We can also force them to have both the same slopes and intercepts so that the products are not different at all.

#same slopes, different intercepts
out2 <- glm(cbind(dead,n-dead) ~ logdose+product+logdose:newprod, data=insec, family=binomial)
#same slopes, same intercepts
out3 <- glm(cbind(dead,n-dead) ~ logdose*newprod, data=insec, family=binomial)

Question 4

The three-product model fits the following regression model.

logit

To see if A and C should have the same slopes (same dose-response relationship) we carry out the test

H0: β5 = 0
H1: β5 ≠ 0

I compare out2 with out1 with a likelihood ratio test.

anova(out2, out1, test='Chisq')
Analysis of Deviance Table

Model 1: cbind(dead, n - dead) ~ logdose + product + logdose:newprod
Model 2: cbind(dead, n - dead) ~ logdose * product
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1        46     43.083                    
2        45     42.305  1  0.77817  
0.3777

The models are not significantly different so we can use the simpler model in which the slopes of A and C are the same. Said differently we fail to reject that β5 = 0. Hence we don't need to include this term in the model. Our model is now the following.

logit 2

I next test whether the intercepts are also the same (meaning that the two products are identical). Formally we carry out the following hypothesis test.

H0: β3 = 0
H1: β3 ≠ 0

anova(out3, out2, test='Chisq')
Analysis of Deviance Table

Model 1: cbind(dead, n - dead) ~ logdose * newprod
Model 2: cbind(dead, n - dead) ~ logdose + product + logdose:newprod
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1        47     43.111                    
2        46     43.083  1   0.0273  
0.8688

So we fail to reject the null hypothesis. Therefore the intercepts of A and C are not different. We conclude that products A and C are not different in their killing power.  

summary(out3)
Call:
glm(formula = cbind(dead, n - dead) ~ logdose * newprod, family = binomial,
    data = insec)

Deviance Residuals:
    Min       1Q   Median       3Q      Max 
-2.2507  -0.6634   0.1382   0.7080   1.6930 

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)   
(Intercept)       -2.1997     0.2152 -10.223  < 2e-16 ***
logdose            3.0988     0.2444  12.679  < 2e-16 ***
newprodB           1.4507     0.3196   4.539 5.64e-06 ***
logdose:newprodB  -1.8677     0.3465  -5.390
7.06e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 311.241  on 50  degrees of freedom
Residual deviance:  43.111  on 47  degrees of freedom
AIC: 198.11

Number of Fisher Scoring iterations: 4

From the summary table of the final model we see that product B has a significantly lower slope than do products A and C together.

Alternatively we could jump over the intermediate model and test whether we need do distinguish product C from product A at all.

H0: β3 = β5 = 0
H1: at least one of β3 and β5 are not zero

We carry out analysis of deviance on out1 and out3.

anova(out3, out1, test='Chisq')
Analysis of Deviance Table

Model 1: cbind(dead, n - dead) ~ logdose * newprod
Model 2: cbind(dead, n - dead) ~ logdose * product
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1        47     43.111                    
2        45     42.305  2  0.80547   0.6685

So we fail to reject the null hypothesis and conclude that we do not need to distinguish products A and C.

Question 5

The intercepts and slopes of the two products are given in the table below

Table 1   Slopes and intercepts for individual products
Product Intercept Slope
A and C coef(out3)[1] coef(out3)[2]
B coef(out3)[1] + coef(out3)[3] coef(out3)[2] + coef(out3)[4]

I use different symbols to indicate the three original products, but different colors that distinguish B from A and C. The graph is shown below.

plot(dead/n ~ logdose, pch=c(21,16,22)[as.numeric(product)], data=insec, col=as.numeric(newprod), xlab='Log dose', ylab='Proportion dying')
inv.logit <- function(b0,b1,x) exp(b0 + b1*x)/(1+exp(b0+b1*x))
curve(inv.logit(coef(out3)[1],coef(out3)[2],x), add=T)
curve(inv.logit(coef(out3)[1]+coef(out3)[3], coef(out3)[2]+coef(out3)[4], x), add=T, col=2, lty=2)
legend('topleft', c(levels(insec$product), 'A and C', 'B'), pch=c(21,16,22,NA,NA), col=c(1,2,1,1,2), lty=c(NA,NA,NA,1,2), bty='n')

fig 1

Fig. 1 Dose response models for product B and products A and C together

Question 6

To find the odds ratios for the effect of dosage we need to exponentiate the slopes that are shown in Table 1.

#odds ratio for dose products A and C
exp(coef(out3)[2])
 logdose
22.17092
#odds ratio for dose product B
exp(coef(out3)[2] + coef(out3)[4])
 logdose
3.425076

So for every one unit increase in log dose, the odds of dying increases by a factor of 22.17 for products A and C, but only by a factor of 3.43 for product B.

Question 7

To obtain a confidence interval for an odds ratio we first calculate the confidence interval for the log odds ratio and then exponentiate the endpoints. To obtain this for product B we need the variance of a sum, coef(out3)[2] + coef(out3)[4]. The method for doing this was explained in lecture 5 and involves extracting the variance-covariance matrix of the parameter estimates and then constructing a quadratic form.

sqrt(c(0,1,0,1)%*%vcov(out3)%*%c(0,1,0,1))
          [,1]
[1,] 0.2456689

A shortcut method for doing this was also given in lecture 5. We fit the model without the logdose main effect and remove the intercept. When there is a factor in the model, R reports the actual estimates at each level of the factor rather than reporting the estimates of the effects as it normally does.

#fit model without logdose and remove intercept explicitly
out3a <- glm(cbind(dead,n-dead) ~ newprod + logdose:newprod - 1, data=insec, family=binomial)
printCoefmat(summary(out3a)$coefficients)
                  Estimate Std. Error  z value  Pr(>|z|)   
newprodAC         -2.19970    0.21518 -10.2228 < 2.2e-16 ***
newprodB          -0.74900    0.23628  -3.1699  0.001525 **
newprodAC:logdose  3.09878   
0.24440  12.6793 < 2.2e-16 ***
newprodB:logdose   1.23112   
0.24567   5.0113 5.406e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Using this output the Wald-based confidence intervals are obtained as follows.

logOR.AC <- summary(out3a)$coefficients[3,1] + c(qnorm(.025), qnorm(.975)) * summary(out3a)$coefficients[3,2]
logOR.B <- summary(out3a)$coefficients[4,1] + c(qnorm(.025), qnorm(.975)) * summary(out3a)$coefficients[4,2]
Wald.results <- rbind(exp(logOR.AC),exp(logOR.B))
colnames(Wald.results) <- c('2.5%','97.5%')
rownames(Wald.results) <- c('Wald: AC','Wald: B')
Wald.results
              2.5%     97.5%
Wald: AC 13.732618 35.794305
Wald: B   2.116199  5.543498

The likelihood-based confidence intervals can be obtained by using the confint function on this last model.

out.loglik <- confint(out3a)
exp(out.loglik)[3:4,]
                      2.5 %    97.5 %
newprodAC:logdose 13.945322 36.399553
newprodB:logdose   2.134199  5.601245

The two types of intervals are quite similar.


hw scores

Course Home Page

Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum for the Environment and Ecology, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--February 23, 2012
URL: https://sakai.unc.edu/access/content/group/2842013b-58f5-4453-aa8d-3e01bacbfc3d/public/Ecol562_Spring2012/docs/solutions/assign5.htm