Assignment 7—Solution

Question 1

pyrau <- read.delim('Pyrausta.txt')
pyrau
  eggs X1936 X1937 X1938
1    0    26    37    14
2    1    19    14    21
3    2     7     2    11
4    3     1     3     7
5    4     3     0     2

I begin by converting the tabulated data into raw data. For this I use the rep function once for each year. In each case I rep the eggs column by the frequencies that appear in the year columns.

y.1936 <- rep(pyrau$eggs, pyrau$X1936)
y.1937 <- rep(pyrau$eggs, pyrau$X1937)
y.1938 <- rep(pyrau$eggs, pyrau$X1938)

Next I concatenate these three vectors together and make them a column in a data frame. To create a year variable I rep each year by the total for that year (obtained by taking the length of raw data vector for that year) and turn the result into a factor.

pyrau.raw <- data.frame(count=c(y.1936, y.1937, y.1938), year=factor(rep(1936:1938, c(length(y.1936), length(y.1937), length(y.1938)))))
dim(pyrau.raw)
[1] 167   2

Using glm

To test for a year effect I use glm to fit a Poisson regression model in which year is the only predictor. I use the anova function to carry out a likelihood ratio test for the effect of year.

out1 <- glm(count~year, data=pyrau.raw, family=poisson)
anova(out1, test='Chisq')
Analysis of Deviance Table

Model: poisson, link: log

Response: count

Terms added sequentially (first to last) >

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)   
NULL                   166     220.91             
year  2   22.095       164     198.82 1.593e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The model being fit is the following.

log mu

The likelihood ratio test tests the following hypothesis.

H0:  β1 = β2 = 0
H1:  at least one of β1, β2 not zero

The reported p-value for the test is small so we reject the null hypothesis and conclude there is a year effect. If we examine the summary table we can determine how the years differ.

summary(out1)
Call:
glm(formula = count ~ year, family = poisson, data = pyrau.raw)

Deviance Residuals:
    Min       1Q   Median       3Q      Max 
-1.6181  -0.9820  -0.2820   0.5599   2.4572 

Coefficients:
            Estimate Std. Error z value Pr(>|z|) 
(Intercept)  -0.1542     0.1443  -1.068   0.2855 
year1937     -0.5754     0.2406  -2.392   0.0168 *
year1938      0.4235     0.1863   2.273   0.0230 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 220.91  on 166  degrees of freedom
Residual deviance: 198.82  on 164  degrees of freedom
AIC: 414.33

Number of Fisher Scoring iterations: 6

From the output we see that the mean in 1937 is significantly lower than the mean in 1936 (the reference group) and the mean in 1938 is significantly higher than the mean in 1936. Given this and the fact that the means are ranked mean rankings, it follows that the means in 1937 and 1938 are also significantly different. Thus all the means differ from each other.

Using nlm

Alternatively we can construct the log-likelihoods ourselves and use nlm to obtain the maximum likelihood estimates and to carry out the likelihood ratio test.

# common mean model
negpois1.LL <- function(p){
LL <- sum(log(dpois(pyrau.raw$count, lambda=p)))
-LL
}
out.pois1 <- nlm(negpois1.LL,.8)
out.pois1
# separate means model
negpois2.LL <- function(p){
mu <- p[1] + p[2]*(pyrau.raw$year==1937) + p[3]*(pyrau.raw$year==1938)
LL <- sum(log(dpois(pyrau.raw$count, lambda=mu)))
-LL
}
out.pois2 <- nlm(negpois2.LL, c(0.8,-0.4,0.4))
LRstat <- 2*(out.pois1$minimum-out.pois2$minimum)
LRstat
[1] 22.09468
length(out.pois2$estimate)-length(out.pois1$estimate)
[1] 2
1-pchisq(LRstat, df=2)
[1] 1.592949e-05

This is the same answer we obtained with the anova function on the glm object.

Question 2

We need the tabulated data for the plot. I rep the count categories three times, concatenate the three columns of frequencies together, and make a third column that records the year.

pyrau.table <- data.frame(count=rep(pyrau$eggs,3), freq=unlist(pyrau[,2:4]), year=factor(rep(1936:1938, each=5)))
rownames(pyrau.table) <- NULL
pyrau.table
   count freq year
1      0   26 1936
2      1   19 1936
3      2    7 1936
4      3    1 1936
5      4    3 1936
6      0   37 1937
7      1   14 1937
8      2    2 1937
9      3    3 1937
10     4    0 1937
11     0   14 1938
12     1   21 1938
13     2   11 1938
14     3    7 1938
15     4    2 1938

I use the predict function with the newdata argument to add the predicted mean for each year to the data frame.

pyrau.table$mu <- predict(out1, type='response', newdata=data.frame(year=pyrau.table$year))
pyrau.table
   count freq year        mu
1      0   26 1936 0.8571429
2      1   19 1936 0.8571429
3      2    7 1936 0.8571429
4      3    1 1936 0.8571429
5      4    3 1936 0.8571429
6      0   37 1937 0.4821429
7      1   14 1937 0.4821429
8      2    2 1937 0.4821429
9      3    3 1937 0.4821429
10     4    0 1937 0.4821429
11     0   14 1938 1.3090909
12     1   21 1938 1.3090909
13     2   11 1938 1.3090909
14     3    7 1938 1.3090909
15     4    2 1938 1.3090909

Next I use the dpois function to calculate the Poisson probabilities of the count categories using the mean just created. I add the tail probabilities to the last reported category, X = 4.

pyrau.table$p <- dpois(pyrau.table$count, pyrau.table$mu)
pyrau.table$p2 <- pyrau.table$p + ppois(4, pyrau.table$mu, lower.tail=F) * (pyrau.table$count==4)
pyrau.table
   count freq year        mu           p          p2
1      0   26 1936 0.8571429 0.424372846 0.424372846
2      1   19 1936 0.8571429 0.363748153 0.363748153
3      2    7 1936 0.8571429 0.155892066 0.155892066
4      3    1 1936 0.8571429 0.044540590 0.044540590
5      4    3 1936 0.8571429 0.009544412 0.011446345
6      0   37 1937 0.4821429 0.617458847 0.617458847
7      1   14 1937 0.4821429 0.297703373 0.297703373
8      2    2 1937 0.4821429 0.071767777 0.071767777
9      3    3 1937 0.4821429 0.011534107 0.011534107
10     4    0 1937 0.4821429 0.001390272 0.001535896
11     0   14 1938 1.3090909 0.270065459 0.270065459
12     1   21 1938 1.3090909 0.353540237 0.353540237
13     2   11 1938 1.3090909 0.231408155 0.231408155
14     3    7 1938 1.3090909 0.100978104 0.100978104
15     4    2 1938 1.3090909 0.033047380 0.044008045

Obtaining the expected counts

Method 1:

To get the expected category counts according to the model we need the count totals for each year.

apply(pyrau[,2:4],2,sum)
X1936 X1937 X1938
   56    56    55

A simple way to obtain the expected counts is with an ifelse in which I multiply the probabilities by 55 for year 1938 and by 56 otherwise.

pyrau.table$exp <- ifelse(pyrau.table$year==1938, pyrau.table$p2*55, pyrau.table$p2*56)
pyrau.table
   count freq year        mu           p          p2         exp
1      0   26 1936 0.8571429 0.424372846 0.424372846 23.76487936
2      1   19 1936 0.8571429 0.363748153 0.363748153 20.36989659
3      2    7 1936 0.8571429 0.155892066 0.155892066  8.72995568
4      3    1 1936 0.8571429 0.044540590 0.044540590  2.49427305
5      4    3 1936 0.8571429 0.009544412 0.011446345  0.64099531
6      0   37 1937 0.4821429 0.617458847 0.617458847 34.57769544
7      1   14 1937 0.4821429 0.297703373 0.297703373 16.67138887
8      2    2 1937 0.4821429 0.071767777 0.071767777  4.01899553
9      3    3 1937 0.4821429 0.011534107 0.011534107  0.64591000
10     4    0 1937 0.4821429 0.001390272 0.001535896  0.08601017
11     0   14 1938 1.3090909 0.270065459 0.270065459 14.85360024
12     1   21 1938 1.3090909 0.353540237 0.353540237 19.44471304
13     2   11 1938 1.3090909 0.231408155 0.231408155 12.72744853
14     3    7 1938 1.3090909 0.100978104 0.100978104  5.55379572
15     4    2 1938 1.3090909 0.033047380 0.044008045  2.42044246

Method 2:

Use the fact that becausde year is a factor when it gets converted to a numeric variable it has the values 1, 2, and 3. These can then be used to select the correct total from a vector of totals.

n <- apply(pyrau[,2:4], 2, sum)
pyrau.table$exp <- pyrau.table$p2 * n[as.numeric(pyrau.table$year)]

Method 3:

A more general method is to tabulate the counts and collect them in a separate data frame with a year variable. Then using the year variable as the key field we canI do a many-to-one merge and add the sample totals as a new column to the data frame. To get the expected counts multiply the sample total column by the probability column.

n <- apply(pyrau[,2:4], 2, sum)
new.dat <- data.frame(year=1936:1938, n=as.numeric(apply(pyrau[,2:4], 2, sum)))
new.dat
  year  n
1 1936 56
2 1937 56
3 1938 55
pyrau.table2 <- merge(pyrau.table, new.dat)
pyrau.table2$exp <- pyrau.table2$p2*pyrau.table2$n
pyrau.table2
   year count freq        mu           p          p2  n         exp
1  1936     0   26 0.8571429 0.424372846 0.424372846 56 23.76487936
2  1936     1   19 0.8571429 0.363748153 0.363748153 56 20.36989659
3  1936     2    7 0.8571429 0.155892066 0.155892066 56  8.72995568
4  1936     3    1 0.8571429 0.044540590 0.044540590 56  2.49427305
5  1936     4    3 0.8571429 0.009544412 0.011446345 56  0.64099531
6  1937     0   37 0.4821429 0.617458847 0.617458847 56 34.57769544
7  1937     1   14 0.4821429 0.297703373 0.297703373 56 16.67138887
8  1937     2    2 0.4821429 0.071767777 0.071767777 56  4.01899553
9  1937     3    3 0.4821429 0.011534107 0.011534107 56  0.64591000
10 1937     4    0 0.4821429 0.001390272 0.001535896 56  0.08601017
11 1938     0   14 1.3090909 0.270065459 0.270065459 55 14.85360024
12 1938     1   21 1.3090909 0.353540237 0.353540237 55 19.44471304
13 1938     2   11 1.3090909 0.231408155 0.231408155 55 12.72744853
14 1938     3    7 1.3090909 0.100978104 0.100978104 55  5.55379572
15 1938     4    2 1.3090909 0.033047380 0.044008045 55  2.42044246

Graph the results

Finally we can produce the graph.

library(lattice)
xyplot(freq~count|year, data=pyrau.table, xlab='Count category', panel=function(x, y, subscripts) {
panel.xyplot(x, y, type='h', lineend=1, col='grey', lwd=10)
panel.points(x, pyrau.table$exp[subscripts], pch=16, cex=.6, col=1, type='o')
}, key=list(x=.82, y=.80, corner=c(0,0), points=list(pch=c(15,16), cex=c(1.2,.7), col=c('grey70', 'black')), text=list(c('observed', 'expected'), cex=.9)))

fig. 1

Fig. 1   Predicted and observed count categories under a Poisson model

Question 3

The simplest way to carry out the parametric bootstrap separately by year is with the chisq.test function. We just need to use it three times each time selecting the appropriate observed counts and predicted probabilities for the corresponding year.

chisq.test(pyrau.table$freq[1:5],p=pyrau.table$p2[1:5],simulate.p.value=T, B=9999)
            Chi-squared test for given probabilities with simulated p-value
            (based on 9999 replicates)

data:  pyrau.table$freq[1:5]
X-squared = 10.222, df = NA, p-value = 0.0478

chisq.test(pyrau.table$freq[6:10],p=pyrau.table$p2[6:10],simulate.p.value=T, B=9999)
            Chi-squared test for given probabilities with simulated p-value
            (based on 9999 replicates)

data:  pyrau.table$freq[6:10]
X-squared = 10.2778, df = NA, p-value = 0.1048

chisq.test(pyrau.table$freq[11:15],p=pyrau.table$p2[11:15],simulate.p.value=T, B=9999)
            Chi-squared test for given probabilities with simulated p-value
            (based on 9999 replicates)

data:  pyrau.table$freq[11:15]
X-squared = 0.8575, df = NA, p-value = 0.9332

Based on the output it is only the fit in 1936 that is suspect. To determine what's driving the lack of fit we can look at the individual contributions to the Pearson statistic. Doing so we see that it's the last term corresponding to X = 4 that is the largest. In fact, its contribution is 85% of the entire Pearson statistic. Under the fitted Poisson model observing three counts of four is apparently highly unlikely, yet that is exactly what was observed in 1936.

sum((pyrau.table$freq[1:5] - pyrau.table$exp[1:5])^2/pyrau.table$exp[1:5])
[1] 10.22201
(pyrau.table$freq[1:5] - pyrau.table$exp[1:5])^2/pyrau.table$exp[1:5]
[1] 0.21021627 0.09212696 0.34281350 0.89519147 8.68165956

Question 4

I repeat the protocol outlined in lecture 17 only modifying it to account for the fact that we have three groups to consider.

pyrau.p <- split(pyrau.table$p2, pyrau.table$year)
n <- apply(pyrau[,2:4],2,sum)
myfunc <- function() {
out.obs <- as.vector(sapply(1:3, function(x) rmultinom(1, size=n[x], prob=pyrau.p[[x]])))
sum((out.obs-pyrau.table$exp)^2/pyrau.table$exp)
}
set.seed(50)
sim.data <- replicate(9999, myfunc())
actual <- sum((pyrau.table$freq-pyrau.table$exp)^2/pyrau.table$exp)
actual
[1] 21.35731
pearson <- c(sim.data, actual)
pval <- sum(pearson>=actual)/length(pearson)
pval
[1] 0.0664

The test is not significant, although it's close. So, we don't have evidence of a lack of fit.


hw 7

Course Home Page

Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum for Ecology and the Environment, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--October 29, 2012
URL: https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/solutions/assign7.htm