Assignment 5—Solution

insec <- read.table( 'http://www.bio.ic.ac.uk/research/mjcraw/statcomp/data/logistic.txt', header=T)

names(insec)

[1] "logdose" "n" "dead" "product"

Question 1

These are binomial data where we are given the number of successes (dead) out of a total (n). To fit this model in R the response variable needs to be a matrix in which the first column is the number of successes and the second column is the number of failures. Hence we use cbind(dead, n-dead) as the response. To address the question, "Does the dose-response relationship vary by product?" we need to fit a model in which logdose and product interact.

out1 <- glm(cbind(dead, n-dead)~logdose*product, data=insec, family=binomial)

anova(out1, test='Chisq')

Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(dead, n - dead)

Terms added sequentially (first to last)

                Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                               50    311.241
logdose          1 238.854        49     72.387 < 2.2e-16 ***
product          2    0.206        47     72.181    0.9022
logdose:product 2   29.876        45     42.305 3.255e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The analysis of deviance table reveals a statistically significant interaction between log dose and product. So we conclude that the dose-response relationship varies by product.

Question 2

To determine if the residual deviance is suitable here as a goodness of fit statistic we need to examine the expected counts. If no more than 20% of the expected counts are less than 5, the chi-squared distribution of the deviance is reasonable. I use the fitted function to obtain the estimated probabilities and use them to calculate the expected number of successes and failures.

pred.counts <- cbind(fitted(out1)*insec$n, insec$n - fitted(out1) * insec$n)

sum(pred.counts<5)/length(as.vector(pred.counts))

[1] 0.245098

So 24.5% of the counts are less than 5. The use of the residual deviance as as a goodness of fit statistic is suspect here.

Question 3

To determine if the model can be simplified I examine the summary table for the model.

summary(out1)

Call:
glm(formula = cbind(dead, n - dead) ~ logdose * product, family = binomial,
data = insec)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.39800 -0.60734 0.09695 0.65920 1.69300

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)
(Intercept)       -2.0410     0.3034 -6.728 1.72e-11 ***
logdose            2.8768     0.3432   8.383 < 2e-16 ***
productB           1.2920     0.3845   3.360 0.00078 ***
productC          -0.3079     0.4308 -0.715 0.47469
logdose:productB -1.6457     0.4220 -3.899 9.65e-05 ***
logdose:productC   0.4318     0.4898   0.882 0.37793
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 311.241 on 50 degrees of freedom
Residual deviance: 42.305 on 45 degrees of freedom
AIC: 201.31

Number of Fisher Scoring iterations: 4

From the output we see that the dose response relationship for product C is not different than it is for product A , the reference group (p = .038). On the other hand the dose-response relationship of products A and B are clearly different (p < .001). This suggests we might collapse products A and C together and refit the model.

insec$newprod <- factor(ifelse(insec$product=='B', 'B', 'AC'))

There are two new models we can fit. We can force products A and C to have the same slopes (on a logit scale) but different intercepts. If this model holds, the dose effect is the same but one of the products consistently kills more bugs than the other. We can also force them to have both the same slopes and intercepts so that the products are not different at all.

#same slopes, different intercepts

out2 <- glm(cbind(dead,n-dead) ~ logdose+product+logdose:newprod, data=insec, family=binomial)

#same slopes, same intercepts

out3 <- glm(cbind(dead,n-dead) ~ logdose*newprod, data=insec, family=binomial)

Question 4

The three-product model fits the following regression model.

logit

To see if A and C should have the same slopes (same dose-response relationship) we carry out the test

H₀: β₅ = 0
H₁: β₅ ≠ 0

I compare out2 with out1 with a likelihood ratio test.

anova(out2, out1, test='Chisq')

Analysis of Deviance Table

Model 1: cbind(dead, n - dead) ~ logdose + product + logdose:newprod
Model 2: cbind(dead, n - dead) ~ logdose * product
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 46 43.083
2 45 42.305 1 0.77817 0.3777

The models are not significantly different so we can use the simpler model in which the slopes of A and C are the same. Said differently we fail to reject that β₅ = 0. Hence we don't need to include this term in the model. Our model is now the following.

I next test whether the intercepts are also the same (meaning that the two products are identical). Formally we carry out the following hypothesis test.

H₀: β₃ = 0
H₁: β₃ ≠ 0

anova(out3, out2, test='Chisq')

Analysis of Deviance Table

Model 1: cbind(dead, n - dead) ~ logdose * newprod
Model 2: cbind(dead, n - dead) ~ logdose + product + logdose:newprod
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 47 43.111
2 46 43.083 1 0.0273 0.8688

So we fail to reject the null hypothesis. Therefore the intercepts of A and C are not different. We conclude that products A and C are not different in their killing power.

summary(out3)

Call:
glm(formula = cbind(dead, n - dead) ~ logdose * newprod, family = binomial,
data = insec)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.2507 -0.6634 0.1382 0.7080 1.6930

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)
(Intercept)       -2.1997     0.2152 -10.223 < 2e-16 ***
logdose            3.0988     0.2444 12.679 < 2e-16 ***
newprodB           1.4507     0.3196   4.539 5.64e-06 ***
logdose:newprodB -1.8677     0.3465 -5.390 7.06e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 311.241 on 50 degrees of freedom
Residual deviance: 43.111 on 47 degrees of freedom
AIC: 198.11

Number of Fisher Scoring iterations: 4

From the summary table of the final model we see that product B has a significantly lower slope than do products A and C together.

Alternatively we could jump over the intermediate model and test whether we need do distinguish product C from product A at all.

H₀: β₃ = β₅ = 0
H₁: at least one of β₃ and β₅ are not zero

We carry out analysis of deviance on out1 and out3.

anova(out3, out1, test='Chisq')

Analysis of Deviance Table

Model 1: cbind(dead, n - dead) ~ logdose * newprod
Model 2: cbind(dead, n - dead) ~ logdose * product
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 47 43.111
2 45 42.305 2 0.80547 0.6685

So we fail to reject the null hypothesis and conclude that we do not need to distinguish products A and C.

Question 5

The intercepts and slopes of the two products are given in the table below

Table 1 Slopes and intercepts for individual products

Product	Intercept	Slope
A and C	coef(out3)[1]	coef(out3)[2]
B	coef(out3)[1] + coef(out3)[3]	coef(out3)[2] + coef(out3)[4]

I use different symbols to indicate the three original products, but different colors that distinguish B from A and C. The graph is shown below.

plot(dead/n ~ logdose, pch=c(21,16,22)[as.numeric(product)], data=insec, col=as.numeric(newprod), xlab='Log dose', ylab='Proportion dying')

inv.logit <- function(b0,b1,x) exp(b0 + b1*x)/(1+exp(b0+b1*x))

curve(inv.logit(coef(out3)[1],coef(out3)[2],x), add=T)

curve(inv.logit(coef(out3)[1]+coef(out3)[3], coef(out3)[2]+coef(out3)[4], x), add=T, col=2, lty=2)

legend('topleft', c(levels(insec$product), 'A and C', 'B'), pch=c(21,16,22,NA,NA), col=c(1,2,1,1,2), lty=c(NA,NA,NA,1,2), bty='n')

fig 1

Fig. 1 Dose response models for product B and products A and C together

Question 6

To find the odds ratios for the effect of dosage we need to exponentiate the slopes that are shown in Table 1.

#odds ratio for dose products A and C

exp(coef(out3)[2])

logdose
22.17092

#odds ratio for dose product B

exp(coef(out3)[2] + coef(out3)[4])

logdose
3.425076

So for every one unit increase in log dose, the odds of dying increases by a factor of 22.17 for products A and C, but only by a factor of 3.43 for product B.

Question 7

To obtain a confidence interval for an odds ratio we first calculate the confidence interval for the log odds ratio and then exponentiate the endpoints. To obtain this for product B we need the variance of a sum, coef(out3)[2] + coef(out3)[4]. The method for doing this was explained in lecture 5 and involves extracting the variance-covariance matrix of the parameter estimates and then constructing a quadratic form.

sqrt(c(0,1,0,1)%*%vcov(out3)%*%c(0,1,0,1))

[,1]
[1,] 0.2456689

A shortcut method for doing this was also given in lecture 5. We fit the model without the logdose main effect and remove the intercept. When there is a factor in the model, R reports the actual estimates at each level of the factor rather than reporting the estimates of the effects as it normally does.

#fit model without logdose and remove intercept explicitly

out3a <- glm(cbind(dead,n-dead) ~ newprod + logdose:newprod - 1, data=insec, family=binomial)

printCoefmat(summary(out3a)$coefficients)

                  Estimate Std. Error z value Pr(>|z|)
newprodAC         -2.19970    0.21518 -10.2228 < 2.2e-16 ***
newprodB          -0.74900    0.23628 -3.1699 0.001525 **
newprodAC:logdose 3.09878    0.24440 12.6793 < 2.2e-16 ***
newprodB:logdose   1.23112    0.24567   5.0113 5.406e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Using this output the Wald-based confidence intervals are obtained as follows.

logOR.AC <- summary(out3a)$coefficients[3,1] + c(qnorm(.025), qnorm(.975)) * summary(out3a)$coefficients[3,2]

logOR.B <- summary(out3a)$coefficients[4,1] + c(qnorm(.025), qnorm(.975)) * summary(out3a)$coefficients[4,2]

Wald.results <- rbind(exp(logOR.AC),exp(logOR.B))

colnames(Wald.results) <- c('2.5%','97.5%')

rownames(Wald.results) <- c('Wald: AC','Wald: B')

Wald.results

2.5% 97.5%
Wald: AC 13.732618 35.794305
Wald: B 2.116199 5.543498

The likelihood-based confidence intervals can be obtained by using the confint function on this last model.

out.loglik <- confint(out3a)

exp(out.loglik)[3:4,]

2.5 % 97.5 %
newprodAC:logdose 13.945322 36.399553
newprodB:logdose 2.134199 5.601245

The two types of intervals are quite similar.

hw scores

Course Home Page

Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum for the Environment and Ecology, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--February 23, 2012
URL: https://sakai.unc.edu/access/content/group/2842013b-58f5-4453-aa8d-3e01bacbfc3d/public/Ecol562_Spring2012/docs/solutions/assign5.htm