Assignment 8—Solution

burn <- read.csv('ecol 562/burn.csv')
names(burn)
 [1] "PolyFID"   "Owner"     "X"         "Y"         "burn50"    "rx_cat3"   "zmin_d_a"
 [8] "zmin_di"   "zm_mcov"   "zm_ccov"   "Hectares"  "zmin_drd"  "prop_timb" "Prop_no_b"
[15] "zmin_dr"   "zmin_ddv"  "year" 

Question 1

glm.mod <- glm(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3), family = binomial, data = burn)
anova(glm.mod, test='Chisq')
Analysis of Deviance Table

Model: binomial, link: logit

Response: burn50

Terms added sequentially (first to last)

                         Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
NULL                                      2763       2321            
Hectares                  1      8.4      2762       2312   0.0037 **
factor(rx_cat3)           3    107.7      2759       2205  < 2e-16 ***
zmin_drd                  1     23.1      2758       2181  1.5e-06 ***
prop_timb                 1      8.2      2757       2173   0.0042 **
Prop_no_b                 1     31.7      2756       2142  1.8e-08 ***
zmin_dr                   1     32.0      2755       2110  1.5e-08 ***
zmin_ddv                  1      2.8      2754       2107   0.0961 . 
factor(rx_cat3):zmin_ddv  3     25.4      2751       2081  1.3e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Question 2

library(gee)
gee.ind <- gee(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3), family = binomial, data = burn, id=PolyFID, corstr="independence", scale.fix=T)
gee.ex <- gee(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3), family = binomial, data = burn, id=PolyFID, corstr="exchangeable", scale.fix=T)
gee.un <- gee(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3), family = binomial, data = burn, id=PolyFID, corstr="unstructured", scale.fix=T)
gee.ar1 <- gee(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3), family = binomial, data = burn, id=PolyFID, corstr="AR-M", Mv=1, scale.fix=T)
gee.non <- gee(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3), family = binomial, data = burn, id=PolyFID, corstr="non_stat_M_dep", Mv=1, scale.fix=T)
gee.sta <- gee(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3), family = binomial, data = burn, id=PolyFID, corstr="stat_M_dep", Mv=1, scale.fix=T)

Question 3

I obtain the QIC, CIC, sum of the absolute differences between the naive parameter standard errors and the robust parameter standard errors, and the sum of the absolute differences of the naive parameter covariance matrix and the robust parameter covariance matrix. I assemble the results in a matrix and identify the winners.

stats <- sapply(list(gee.ind, gee.ex, gee.un, gee.ar1, gee.non, gee.sta), function(x) QIC.binom.gee(x, gee.ind))
sediff <- sapply(list(gee.ind, gee.ex, gee.un, gee.ar1, gee.non, gee.sta), function(x) sum(abs(summary(x)$coefficients[,2]-summary(x)$coefficients[,4])))
covdiff <- sapply(list(gee.ind, gee.ex, gee.un, gee.ar1, gee.non, gee.sta), function(x) sum(abs(x$robust.variance-x$naive.variance)))
out.comp <- cbind(t(stats), sediff, covdiff)
rownames(out.comp) <- c('ind', 'ex', 'un', 'AR1', 'nonstat', 'stat')
out.comp
         QIC   CIC sediff covdiff
ind     2106 12.52 0.1346  0.2915
ex      2114 11.17 0.1129  0.2666
un      2129 10.85 0.1393  0.2517
AR1     2113 11.21 0.1420  0.3037
nonstat 2115 11.03 0.1391  0.2954
stat    2114 11.12 0.1380  0.2996
winners <- rownames(out.comp)[apply(out.comp, 2, which.min)]
names(winners) <- colnames(out.comp)
winners
    QIC     CIC  sediff covdiff
  "ind"    "un"    "ex"    "un"

The independence model has the smallest QIC, the unstructured model has the smallest CIC, the exchangeable model has the lowest sum of absolute standard error differences, and the unstructured model has the lowest sum of absolute covariance differences. The proponents of CIC note that for models that do not predict the mean well, QIC does a poor job of selecting the correct correlation structure, while CIC performs better. Given that the unstructured matrix is the winner using CIC and one additional metric, at this point I would lean toward using the unstructured correlation matrix.

Question 4

The data span the years 2004 through 2007. A manager who followed the rule of thumb and who burned in 2004 would then not burn in 2005, not burn in 2006, but burn again in 2007. Thus response variable in 2004 is negatively correlated with the response variable in 2005 and 2006, but positively correlated with it in 2007. Thus in the four year span, observations that are one or two years apart should be negatively correlated while observations that are three years apart should be positively correlated. The correlation matrix would be expected to contain terms whose signs exhibit the following pattern.

correlation matrix

If we examine the estimated unstructured correlation matrix it is exactly of this form with all negative values on the off-diagonal except in the (1, 4) position, which is positive.

gee.un$working.correlation
        [,1]    [,2]    [,3]    [,4]
[1,]  1.0000 -0.1603 -0.1824  0.2462
[2,] -0.1603  1.0000 -0.1443 -0.1632
[3,] -0.1824 -0.1443  1.0000 -0.0857
[4,]  0.2462 -0.1632 -0.0857  1.0000

Question 5

library(lme4)
lmer.fit <- lmer(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3) + (1|PolyFID), family = binomial, data = burn)

Question 6

out.mods <- cbind(coef(glm.mod), coef(gee.un), fixef(lmer.fit))
colnames(out.mods) <- c('GLM', 'GEE', 'GLMM')
out.mods
                                 GLM        GEE       GLMM
(Intercept)               -1.9896559 -1.5952972 -1.9896559
Hectares                   0.0009879  0.0005944  0.0009879
factor(rx_cat3)2           0.9214607  0.6034236  0.9214608
factor(rx_cat3)3           1.0122814  0.3673825  1.0122815
factor(rx_cat3)4           0.6795490  0.1071735  0.6795490
zmin_drd                   0.0725129  0.0503981  0.0725129
prop_timb                 -0.9587330 -0.6887083 -0.9587331
Prop_no_b                 -0.9963088 -1.0300449 -0.9963088
zmin_dr                   -0.4510979 -0.4725218 -0.4510980
zmin_ddv                  -0.0742413  0.0843247 -0.0742413
factor(rx_cat3)2:zmin_ddv -0.0499653 -0.2330747 -0.0499653
factor(rx_cat3)3:zmin_ddv  1.2818659  1.0374087  1.2818659
factor(rx_cat3)4:zmin_ddv  0.2610036  0.1272098  0.2609800

The fact that the GLM and GLMM solutions are identical suggests that the GLMM has converged to the GLM solution, i.e., a model without random effects. We can verify this by examining the log-likelihoods of these two models as well as the variance components of the mixed effects model. The log-likelihoods are identical and the variance component is essentially zero.

sapply(list(glm.mod, lmer.fit), logLik)
[1] -1040.7 -1040.7
VarCorr(lmer.fit)
$PolyFID
            (Intercept)
(Intercept) 1.42621e-15
attr(,"stddev")
(Intercept)
3.77652e-08
attr(,"correlation")
            (Intercept)
(Intercept)           1

attr(,"sc")
[1] NA

When this happens it means the model has converged to a local solution rather than the global solution. Under normal circumstances we might try to reparameterize the model or fiddle with control settings, but those are unlikely to be successful here. A random intercepts model induces an exchangeable correlation structure within clusters so that all members of the group are equally correlated to each other and positively so. We've already seen from Question 4 that this is probably the wrong correlation structure for these data. Consequently a random intercepts model is probably the wrong model for these data too.

Question 7

I create a matrix in which all the possible pairs of the four different years appear as row entries.

wave.mat <- matrix(NA, nrow=6, ncol=2)
k <- 1
for(i in 2004:2006){
for(j in (i+1):2007){
wave.mat[k,] <- c(i,j)
k <- k+1
}}
wave.mat
     [,1] [,2]
[1,] 2004 2005
[2,] 2004 2006
[3,] 2004 2007
[4,] 2005 2006
[5,] 2005 2007
[6,] 2006 2007

I next use the Prentice function from class to calculate upper and lower bounds for the six entries in the correlation matrix corresponding to these pairs of years.

bounds <- sapply(1:6, function(x) lims.func(fitted(gee.un), burn$burn50, burn$year, wave.mat[x,1], wave.mat[x,2]))
low.mat <- gee.un$working.correlation[ lower.tri(gee.un$working.correlation)]
bounds
          [,1]      [,2]       [,3]       [,4]      [,5]      [,6]
[1,] -0.193763 -0.112229 -0.0495914 -0.0544835 -0.182476 -0.146461
[2,]  0.184067  0.140478  0.3932899  0.3597066  0.340684  0.534700
low.mat
[1] -0.1603141 -0.1824291  0.2461951 -0.1442678 -0.1631727 -0.0857017
low.mat>=bounds[1,] & low.mat<=bounds[2,]
[1]  TRUE FALSE  TRUE FALSE  TRUE  TRUE

The entries in the (1,3) and (2,3) positions exceed the Prentice bounds. The estimated values are far too negative.

Question 8

I replace the entries in the (1,3) and (2,3) positions with –0.11 and –0.05 respectively (as well as the corresponding entries in the lower triangle).

new.mat <- gee.un$working.correlation
new.mat[1,3] <- -0.11
new.mat[3,1] <- new.mat[1,3]
new.mat[2,3] <- -.05
new.mat[3,2] <- new.mat[2,3]
new.mat
          [,1]      [,2]       [,3]       [,4]
[1,]  1.000000 -0.160314 -0.1100000  0.2461951
[2,] -0.160314  1.000000 -0.0500000 -0.1631727
[3,] -0.110000 -0.050000  1.0000000 -0.0857017
[4,]  0.246195 -0.163173 -0.0857017  1.0000000

Question 9

The geepack package requires that we extract the lower triangle of the correlation matrix we wish to use and then replicate it with the rep function as many times as there are different groups in the data set. The number of groups is the number of unique values of the plot indicator PolyFID.

length(unique(burn$PolyFID))
[1] 691
zcor <- rep(new.mat[lower.tri(new.mat)], length(unique(burn$PolyFID)))

To fit the model with geeglm I enter this as the zcor argument and specify corst="fixed".

library(geepack)
geeglm.fix <- geeglm(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3), family = binomial, data = burn, id=PolyFID, corstr="fixed", zcor=zcor, scale.fix=T)
summary(geeglm.fix)
Call:
geeglm(formula = burn50 ~ Hectares + factor(rx_cat3) + zmin_drd +
    prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3),
    family = binomial, data = burn, id = PolyFID, zcor = zcor,
    corstr = "fixed", scale.fix = T)

 Coefficients:
                           Estimate   Std.err   Wald Pr(>|W|)   
(Intercept)               -1.697391  0.138567 150.05  < 2e-16 ***
Hectares                   0.000698  0.000325   4.61   0.0317 * 
factor(rx_cat3)2           0.669453  0.162977  16.87  4.0e-05 ***
factor(rx_cat3)3           0.537899  0.260912   4.25   0.0392 * 
factor(rx_cat3)4           0.250123  0.215030   1.35   0.2447   
zmin_drd                   0.056454  0.025438   4.93   0.0265 * 
prop_timb                 -0.752567  0.277991   7.33   0.0068 **
Prop_no_b                 -0.982047  0.233089  17.75  2.5e-05 ***
zmin_dr                   -0.460432  0.107152  18.46  1.7e-05 ***
zmin_ddv                   0.026709  0.076397   0.12   0.7266   
factor(rx_cat3)2:zmin_ddv -0.200711  0.129358   2.41   0.1208   
factor(rx_cat3)3:zmin_ddv  1.093507  0.343279  10.15   0.0014 **
factor(rx_cat3)4:zmin_ddv  0.179461  0.089204   4.05   0.0442 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Scale is fixed.

Correlation: Structure = fixed  Link = identity

Estimated Correlation Parameters:
        Estimate Std.err
alpha:1        1       0
Number of clusters:   691   Maximum cluster size: 4

It turns out this model could also have been fit using the gee function of the gee package. We need to use corst="fixed" with the R= argument in which we specify the correlation matrix for the largest sized group. As usual the group is identifed with the id variable.

gee.fix <- gee(burn50 ~ Hectares + factor(rx_cat3) + zmin_drd + prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3), family = binomial, data = burn, id=PolyFID, corstr="fixed", R=new.mat, scale.fix=T)
summary(gee.fix)
 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998)

Model:
 Link:                      Logit
 Variance to Mean Relation: Binomial
 Correlation Structure:     Fixed

Call:
gee(formula = burn50 ~ Hectares + factor(rx_cat3) + zmin_drd +
    prop_timb + Prop_no_b + zmin_dr + zmin_ddv + zmin_ddv:factor(rx_cat3),
    id = PolyFID, data = burn, R = new.mat, family = binomial,
    corstr = "fixed", scale.fix = T)

Summary of Residuals:
    Min      1Q  Median      3Q     Max
-0.6635 -0.1880 -0.1085 -0.0221  0.9921

Coefficients:
                           Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept)               -1.697391   0.165405 -10.262    0.138567   -12.25
Hectares                   0.000698   0.000317   2.203    0.000325     2.15
factor(rx_cat3)2           0.669453   0.179328   3.733    0.162977     4.11
factor(rx_cat3)3           0.537899   0.265638   2.025    0.260912     2.06
factor(rx_cat3)4           0.250123   0.221797   1.128    0.215030     1.16
zmin_drd                   0.056454   0.031749   1.778    0.025438     2.22
prop_timb                 -0.752567   0.293341  -2.565    0.277991    -2.71
Prop_no_b                 -0.982047   0.209656  -4.684    0.233089    -4.21
zmin_dr                   -0.460432   0.091942  -5.008    0.107152    -4.30
zmin_ddv                   0.026709   0.087260   0.306    0.076397     0.35
factor(rx_cat3)2:zmin_ddv -0.200711   0.121673  -1.650    0.129358    -1.55
factor(rx_cat3)3:zmin_ddv  1.093507   0.348452   3.138    0.343279     3.19
factor(rx_cat3)4:zmin_ddv  0.179461   0.098064   1.830    0.089204     2.01

Estimated Scale Parameter:  1
Number of Iterations:  4

Working Correlation
       [,1]   [,2]    [,3]    [,4]
[1,]  1.000 -0.160 -0.1100  0.2462
[2,] -0.160  1.000 -0.0500 -0.1632
[3,] -0.110 -0.050  1.0000 -0.0857
[4,]  0.246 -0.163 -0.0857  1.0000


hw scores

Course Home Page

Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum for the Environment and Ecology, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--March 25, 2012
URL: https://sakai.unc.edu/access/content/group/2842013b-58f5-4453-aa8d-3e01bacbfc3d/public/Ecol562_Spring2012/docs/solutions/assign8.htm