Lecture 8 (lab 3)—Friday, January 27, 2012

Outline of lecture

Overview
Likelihood of a Poisson model
The log-likelihood
Poisson regression model with a fixed λ
Poisson regression model in which λ varies with NAP
- Fitting a model with an identity link
- Fitting a model with a log link
Fitting Poisson models with the glm function
- Poisson regression with a log link
- Poisson regression with an identity link
The Poisson regression model as a data-generating mechanism
Poisson regression models
Comparing the likelihoods of discrete and continuous probability models
Checking the fit of the Poisson model

R functions and commands demonstrated

as.vector converts a matrix into a vector by stacking the columns of the matrix on top of each other.
dpois is the probability mass function of the Poisson distribution.
expression constructs mathematical text expressions for use as labels in graphs.
fitted when applied to a glm model object returns estimates of the mean for each observation on the scale of the raw response, i.e., it does what predict does but it then inverts the link function.
glm is the generalized linear model function.
list concatenates objects of diverse types together as a single object.
logLik returns the log-likelihood of a fitted model.
nlm is a nonlinear minimization function of R.
optim is another numerical optimization function of R. It offers more optimization methods than does nlm.
optimize is a one-dimensional numerical optimization function of R.
sapply is used to apply a function individually to the elements in a vector returning one value for each component of the vector. The result will be a vector, a matrix, or a list of values.
t is the matrix transpose function. It swaps the rows and columns of a matrix.

R function options

family= (argument to glm) specifies a member of the exponential family of distributions to use as the probability model for the response in a generalized linear model.
lambda= (argument to dpois, ppois, etc.) is the mean of a Poisson distribution.
link= (argument to glm as part of the family argument) is used to specify the link function that connects the systematic and random components of a generalized linear model.
start= (argument to glm) is used to specify starting values for the parameters of a regression model.
test='Chisq' (argument to anova) carries out a likelihood ratio test (analysis of deviance). If anova is given only one model the individual predictors are tested in the order they are specified in the model expression. If it is given two models that are nested, anova provides a LR test of whether the parameters that are contained in the larger model but omitted from the smaller model are different from zero.

R packages used

scatterplot3d (for scatterplot3d) is used to produce 3-dimensional scatter plots.

Overview of today's lab

Today we continue our discussion of maximum likelihood estimation by using it to obtain parameter estimates of a Poisson regression model. We start by constructing the likelihood from first principles, then plot the log-likelihood to determine the maximum likelihood estimate (MLE) graphically, and finally use R's optimization functions to obtain the estimate numerically. Poisson regression models can also be fit using the glm function of R. The summary, anova, coef, and predict functions when used on glm objects produce output similar to what we obtained using lm. To compare Poisson regression models to their normal regression model counterparts we use three-dimensional graphics and analytical methods. Finally a graphical method is used to assess the fit of a Poisson regression model.

Likelihood of a Poisson model

The Rikz data set is a supposedly random sample of m = 45 sites that we've used to construct regression models for species richness at the sites. When we regressed richness against NAP we saw problems with the underlying assumption that the response variable is normally distributed about the regression line with constant variance. In particular the model predicted that at large NAP values, but still within the range of the data, there is a sizable probability for richness values generated by the model to be negative.

The problem with using a normal distribution for these data is that it doesn't respect the natural characteristics of the data. A normally distributed response is a continuous variable but richness can assume only non-negative integer values. An alternative probability model that is often used with count data is the Poisson distribution whose probability mass function is

Poisson

In R the Poisson probability mass function is given by dpois(x, lambda).

In lecture 7 we showed that if we have a random sample then the joint probability of our data can be written as follows.

random sample

Plugging Poisson probabilities into our expression for the joint probability yields the following.

Poisson model

where in the last step the joint probability is denoted by f, a function of the data (x₁, x₂, ... , x_m) as well as the Poisson parameter λ. If we knew λ then we could calculate the probability of obtaining any set of observed values x₁, x₂, ... , x_m. Furthermore, for fixed λ if we summed this expression over all possible values of x₁, x₂, ... , x_m we would get 1, because f is a probability function.

In Poisson regression we typically model the rate parameter λ of the Poisson distribution in terms of predictors. For reasons we'll explain later the preferred approach is to model log λ rather than λ.

log lambda

So under this scenario the probability of our data would be the following.

Poisson regression

where x_i is the observed species richness in sample i.

Typically when using a probability distribution such as the Poisson we think of the parameters as fixed and the data as random. Given fixed values of the parameters, the Poisson formula returns the probability of obtaining a specific data value. In likelihood theory we turn these roles around. We view the data as fixed (after all, it's been collected so it's no longer random) and the parameters as random. To emphasize this in the Poisson example we could write the probability of our data as follows.

where we now use the symbol L instead of f and we switch the order of its arguments. Viewed this way L is not a a probability. For fixed data if we sum over all possible values of λ we will not get 1. So when we think of the probability function as a function of the parameters rather than a function of the data we call it a likelihood. If we plug in values for λ we get different values for the likelihood. Keep in mind that fundamentally it is still the joint probability function for our data under the assumed probability model, only by another name.

From a likelihood perspective a natural question to ask is the following, is there a value of λ at which the likelihood achieves a maximum? Or, put another way, is there a value of λ that makes obtaining the data we actually obtained most probable? In terms of our regression problem above we seek the values of β₀ and β₁ that make it most likely to obtain the data we actually obtained. We call those parameter values the maximum likelihood estimates (MLEs) of the parameters, because they maximize the likelihood. One of R. A. Fisher's major contributions to statistics (among many) was to realize that the likelihood function is a vehicle for obtaining parameter estimates.

The log-likelihood

log

Fig. 1 Monotonicity of the logarithm function

For both practical and theoretical reasons it is preferable to work with the log of the likelihood rather than the likelihood itself. Because the logarithm of a product is the sum of the logs, the log-likelihood has a simpler form than does the likelihood.

log likelihood 1

or for our Poisson example

log-likelihood 2

If our goal is to find the values of the parameters that maximize the likelihood, then nothing is lost if we choose to maximize the log-likelihood instead. This is because the logarithm function is a monotone increasing function. In other words if x₁ < x₂ it also must be the case that log(x₁) < log(x₂), and vice versa (Fig. 1). Consequently in our Poisson example the monotonicity of the logarithm function means that the value of λ that maximizes log L is also the value of λ that maximizes L.

Poisson regression model with a fixed λ

Constructing the log-likelihood

We begin by loading the Rikz data set and refitting some of the normal models from before.

rikz <- read.table( 'ecol 562/RIKZ.txt', header=TRUE)

rikz$richness <- apply(rikz[,2:76], 1, function(x) sum(x>0))

mod1 <- lm(richness~NAP, data=rikz)

mod2 <- lm(richness~NAP + factor(week), data=rikz)

mod3 <- lm(richness~NAP * factor(week), data=rikz)

The Poisson distribution is a standard model for non-negative discrete data (counts). The Poisson distribution is defined by a single parameter called the rate of the process that is generally denoted by the Greek letter λ. We start with a simple Poisson model in which we assume a constant value for λ. This is equivalent to a regression model in which we fit only an intercept.

The expression for the Poisson log-likelihood given at the end of the last section can be translated directly into R language as follows.

Poisson.LL <- function(x) sum(log(dpois(rikz$richness, lambda=x)))

The dpois function is listable. Thus for a fixed value of λ we can obtain the probabilities for all of our richness observations at once. Let's study the behavior of the log-likelihood for different values for λ.

Poisson.LL(2)

[1] -262.4898

Poisson.LL(3)

[1] -203.6908

Poisson.LL(4)

[1] -175.0442

Poisson.LL(5)

[1] -162.9194

Poisson.LL(6)

[1] -161.2451

Poisson.LL(7)

[1] -166.7825

Remember our goal is to maximize the log-likelihood. Because we are dealing with probabilities the log-likelihood is negative and so we're looking for the least negative value of the log-likelihood. Thus if we had to pick a value for λ among the six we've examined so far, we should choose λ = 6, because it produced the largest value of the log-likelihood. If the log-likelihood is well-behaved then it would seem that the log-likelihood is maximized somewhere between λ = 5 and λ = 7.

What happens if we try to evaluate our function on a vector of values?

Poisson.LL(4:5)

[1] -169.5975

It returns a single number that is neither the log-likelihood at λ = 4 nor the log-likelihood at λ = 5. What gives? In the Poisson.LL function dpois is given a vector first argument, rikz$richness, that is of length 45. With 4:5 we are passing a vector of length 2 to dpois for its second argument lambda. When a listable R function is given two vector arguments of different lengths it tries to match them up and, if necessary, it recycles the values of the shorter one until it is the same length as the longer one. Thus R creates the vector 4, 5, 4, 5, ... and uses this for its lambda argument. As a result we get the log Poisson probability of the first observation using lambda = 4, the log Poisson probability of the second observation using lambda = 5, the log Poisson probability of the third observation using lambda = 4, etc. The values of lambda alternate between 4 and 5 each time and the result is nonsensical expression.

The solution to this is to force R to use each element of the lambda argument once for all the values of rikz$richness before moving onto the next element of lambda. The sapply function can be used to do this. The sapply function will force the function that is given as its second argument to be evaluated separately for each element of its first argument.

Poisson.LL(4)

[1] -175.0442

Poisson.LL(5)

[1] -162.9194

sapply(4:5, Poisson.LL)

[1] -175.0442 -162.9194

Now we get what we want: the log-likelihood evaluated once for λ = 4 and a second time for λ = 5.

Graphing the log-likelihood

fig. 2

Fig. 2 Plot of the log-likelihood function

To graph the log-likelihood we can evaluate our function on a whole list of possible values of λ and then plot the results against λ. For example to plot over the domain 4 ≤ λ ≤ 7, we could first generate λ in increments of 0.01 with seq(4,7,.01) to use as the x-coordinates and then sapply the Poisson log-likelihood function to this vector of numbers to generate the corresponding function values for the y-coordinates.

plot(seq(4, 7, .01), sapply(seq(4, 7, .01), Poisson.LL), type='l', ylab='Log-likelihood', xlab=expression(lambda))

seq here generates the numbers 4, 4.01, 4.02, etc. until the number 7 is reached. The syntax is seq(from=, to=, by=).
The use of type='l' causes the plotted points to be connected with line segments ('l' is the first letter in 'line') while suppressing the plotting of the points themselves.
R supports mathematical expressions and Greek letters in text and labels. These are typically invoked with the expression function. The Greek letters are referenced by their English equivalent: alpha, beta, etc. Notice that lambda is not quoted when given as an argument to expression.

From Fig. 2 we see that the maximum of the log-likelihood occurs somewhere between 5.5 and 6.0.

Maximizing the log-likelihood numerically

Clearly we could repeatedly zoom in on the MLE graphically and obtain an estimate that is as accurate as we please, but such an approach is tedious and unwieldy. It is far more efficient to obtain the estimate using numerical optimization. R provides three numerical optimization functions that are useful here, nlm, nlminb, and optim (including its one-dimensional version optimize). Since the current problem is one-dimensional we use optimize. The help screen for optimize indicates that we need to specify three arguments: the function to optimize, an interval over which to search for the optimum value, and the argument maximum=T to indicate that we want a maximum value rather than a minimum value. Since we've already seen that the interval (4, 7) brackets the maximum we use this interval as the search interval.

optimize(Poisson.LL, c(4,7), maximum=T)

$maximum
[1] 5.688883

$objective
[1] -160.8757

The reported maximum value of the log-likelihood is –160.88 and it occurs at λ = 5.69.

Poisson regression model in which λ varies with NAP

Fitting a Poisson regression model with an identity link

One change we can make to the current model is to allow each site to have its own rate parameter. We begin by letting the Poisson parameter λ vary with a site's value of NAP. Because the rate parameter is also the mean of the Poisson distribution this is equivalent to constructing a regression model for the mean.

The function for constructing the log-likelihood under this scenario is displayed below. It's a multi-lined function in which the first line defines the Poisson mean and the second line then uses it as the lambda argument of the dpois function.

Poisson.LL2 <- function(x) {

mu <- x[1] + x[2]*rikz$NAP

sum(log(dpois(rikz$richness, lambda=mu)))

}

The optimization functions of R only accepts single parameter arguments. In order to fit models with multiple parameters we must make these parameters the components of a vector. Here the vector is denoted by x and it has length 2. The first component x[1] is the intercept and the second component x[2] is the slope in the regression equation. As a result when we use this function we have to supply it a vector of length 2.

In order to use the optimization functions of R to obtain the MLE we first need initial estimates of the components of x.

Poisson.LL2(c(1,-2))

[1] NaN
Warning message:
In dpois(x, lambda, log) : NaNs produced

This choice apparently produced some illegal values for the Poisson parameter λ. The reason this happened is because we are using an identity link which offers no protection against the parameter being negative. A log link would be a better choice here.

Poisson.LL2(c(7,-2))

[1] -132.2094

Poisson.LL2(c(7,-3))

[1] -122.5027

Poisson.LL2(c(6,-3))

[1] NaN
Warning message:
In dpois(x, lambda, log) : NaNs produced

Because (7, –3) yields a larger log-likelihood than (7, –2) we use its components for the starting values.

All of the R optimization functions carry out minimization instead of maximization. Thus in order for us to be able to use them we need to change the way we've been formulating the problem. Since maximizing a function f is equivalent to minimizing a function –f, we need to re-express our objective function so that it returns the negative log-likelihood rather than the log-likelihood.

neg.LL <- function(x) -Poisson.LL2(x)

The optim function expects a vector of initial guesses as the first argument and the function to be minimized as the second argument.

optim(c(7,-3), neg.LL)

$par
[1] 6.693066 -2.888178

$value
[1] 122.2139

$counts
function gradient
49 NA

$convergence
[1] 0

$message
NULL

Warning messages:
1: In dpois(x, lambda, log) : NaNs produced
2: In dpois(x, lambda, log) : NaNs produced
3: In dpois(x, lambda, log) : NaNs produced
4: In dpois(x, lambda, log) : NaNs produced
5: In dpois(x, lambda, log) : NaNs produced

The output tells us the following.

The MLEs of the regression parameters are β₀ = 6.693, β₁ = –2.888
The value of the negative log-likelihood at the minimum is 122.2139
49 calls were made to the neg.LL function. There were no calls made to the gradient because by default optim used a finite-difference approximation for the gradient.
The help screen tell us that convergence=0 means a "successful completion."
The warning messages that appear at the end of the output will be explained below.

A second optimization function in R is the nlm function. It also does minimization but it expects the arguments to be in the reverse order from optim: first the function to be minimized and second the initial guesses for the parameters in the form of a vector.

nlm(neg.LL, c(7,-3))

$minimum
[1] 122.2139

$estimate
[1] 6.693129 -2.888344

$gradient
[1] -7.237990e-06 -5.018472e-06

$code
[1] 1

$iterations
[1] 8

Warning messages:
1: In dpois(x, lambda, log) : NaNs produced
2: In nlm(neg.LL, c(7, -3)) : NA/Inf replaced by maximum positive value
3: In dpois(x, lambda, log) : NaNs produced
4: In nlm(neg.LL, c(7, -3)) : NA/Inf replaced by maximum positive value

The output tells us the following.

The value of the negative log-likelihood at the minimum is 122.2139
The MLEs of the regression parameters are β₀ = 6.693, β₁ = –2.888
The value of the gradient at the MLE is very small. This is what we want. The gradient is the derivative of the log-likelihood, which from calculus we know should be zero at a maximum or a minimum. Thus the fact that the reported value of the gradient is very close to zero encourages us to believe that nlm has found a solution (although there's no guarantee that it's the global minimum that we seek).
The help screen tell us that code=1 means "relative gradient is close to zero, current iterate is probably solution."
It took 8 iterations of the numerical algorithm to converge.
The warning messages tell us that in the process of searching for the minimum, the function encountered illegal values that caused it to return a value of NA, or missing. Apparently the algorithm then recovered from this.

The warning messages indicate that problems arose during the iterations but that both optim and nlm managed to recover from them. These problems occurred because we fit a model using an identity link, which failed to prevent the Poisson mean from becoming negative. To correct this we refit the same model using a log link.

Fitting a Poisson model using a log link

To improve numerical stability and prevent the iterations from generating illegal values for the Poisson mean, we rewrite the log-likelihood function using a log link. With a log link the regression model is

The corresponding R function is the following.

Pois.LL3 <- function(x) {

loglambda <- x[1] + x[2]*rikz$NAP

sum(log(dpois(rikz$richness, exp(loglambda))))

}

Notice that the log link is exponentiated when it's entered as the second argument to dpois. As before we need to experiment with the function to obtain good starting values for the parameters.

Pois.LL3(c(2,.5))

[1] -303.0112

Pois.LL3(c(2,.25))

[1] -220.8275

Pois.LL3(c(3,.25))

[1] -607.407

The largest value of the log-likelihood is obtained with β₀ = 2, β₁ = 0.25 so we can use these as our initial guesses. This time we construct the negative log-likelihood function on the fly using a generic function as the second argument to optim.

optim(c(2,.25), function(x) -Pois.LL3(x))

$par
[1] 1.7910256 -0.5561581

$value
[1] 127.5904

$counts
function gradient
57 NA

$convergence
[1] 0

$message
NULL

Now we get numerical convergence without any warning messages. The solution this time is β₀ = 1.791, β₁ = –0.556.

Fitting Poisson models with the glm function

Regression models in which the response has a Poisson distribution are easily fit with the glm function of R. The glm function supports additional probability distributions for the response besides the Poisson. These distributions are connected in that they are all members of what's called the exponential family of probability distributions. We'll discuss some of these other distributions later.

Poisson regression with a log link

The syntax of glm is identical to that of lm except that glm accepts an optional family argument that can be used to specify the desired probability distribution. If the family argument is not specified then glm uses a normal distribution. To fit a Poisson regression model with richness as the response and NAP as the predictor we would proceed as follows. A log link for the mean is the default in Poisson regression and doesn't have to be specifically requested.

# log link is the default

mod1p <- glm(richness~NAP, data=rikz, family=poisson)

The summary and anova functions can be used on a glm object to obtain statistical tests. For anova we need to specify an additional argument, test='Chisq', in order to get p-values for the reported statistical tests.

anova(mod1p, test='Chisq')

Analysis of Deviance Table

Model: poisson, link: log

Response: richness

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                    44     179.75
NAP   1   66.571        43     113.18 3.376e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

summary(mod1p)

Call:
glm(formula = richness ~ NAP, family = poisson, data = rikz)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.2029 -1.2432 -0.9199 0.3943 4.3256

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.79100    0.06329 28.297 < 2e-16 ***
NAP         -0.55597    0.07163 -7.762 8.39e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 179.75 on 44 degrees of freedom
Residual deviance: 113.18 on 43 degrees of freedom
AIC: 259.18

Number of Fisher Scoring iterations: 5

From the output we learn that the estimated regression equation is log μ = 1.791 – 0.556*NAP. This matches what we obtained with the optim function in the previous section.

Although some of the language used in the output will appear strange, e.g., use of the word 'deviance', the output itself should seem familiar. The anova function carries out a sequence of tests on nested models whereas the summary table reports variables-added-last tests. Formally, the anova output reports the results from likelihood ratio tests (analogous to the F-tests of ordinary regression) whereas the summary table reports the results from Wald tests (analogous to the t-tests of ordinary regression). In ordinary regression the t-tests and F-tests are mathematically related and when they are testing the same hypothesis they give the same results. In the likelihood framework, the likelihood ratio tests and Wald tests are constructed from different assumptions and will typically give different results even when they are testing the same hypothesis. This is the case in the above tables where both are testing the hypothesis H₀: β₁ = 0, but the reported p-values are slightly different.

Poisson regression with an identity link

To fit a Poisson regression model with an identity link we need to specify the link as part of the family argument.

mod1p1 <- glm(richness~NAP, data=rikz, family=poisson(link=identity))

Error: no valid set of coefficients has been found: please supply starting values
In addition: Warning message:
In log(ifelse(y == 0, 1, y/mu)) : NaNs produced

The model returns an error message because the built-in algorithm generated starting values that caused problems for the optimization routine. This situation is not unusual when using link functions that are different from the default. We need to supply our own initial values just as we did with the optim function. For glm these are supplied in a start argument.

mod1p1 <- glm(richness~NAP, data=rikz, family=poisson(link=identity), start=c(6,-3))

Error: cannot find valid starting values: please specify some

The values supplied didn't work so we try again.

mod1p1 <- glm(richness~NAP, data=rikz, family=poisson(link=identity), start=c(2,-.1))

Curiously these starting values worked even though they are much farther away from the final solution than the first choice of 6 and –3 are.

coef(mod1p1)

(Intercept) NAP
6.693128 -2.888328

If we give it starting values very close to the true solution it also converges to a solution.

mod1p1 <- glm(richness~NAP, data=rikz, family=poisson(link=identity), start=c(6.7,-2.9))

From the output of the coef function, the estimated regression model with an identity link is the following: μ = 6.693 – 2.888*NAP.

We can compare the log-likelihoods of the two models we've fit, one using a log link and the other using an identity link, and see which model makes the data more probable. The logLik function of R extracts log-likelihoods from glm objects.

logLik(mod1p)

'log Lik.' -127.5904 (df=2)

logLik(mod1p1)

'log Lik.' -122.2139 (df=2)

The log-likelihood of the Poisson model with an identity link is larger suggesting it provides a better fit to the data. To extract the log-likelihoods of both models with a single function call we need to use the sapply function and the list function to group the two models together.

sapply(list(mod1p, mod1p1), logLik)

[1] -127.5904 -122.2139

The predict function when applied to a glm object returns predictions on the scale of the link function. So when we apply it to the glm model object mod1p in which we used a log link, it returns estimates of log μ for each observation.

# predict returns predictions on scale of link function

predict(mod1p)

        1         2         3         4         5         6
1.7659782 2.3669856 2.5337777 1.4485173 2.1712829 1.1293885
        7         8         9        10        11        12
1.3350987 1.4379538 1.7570827 2.5326658 2.3336272 0.9603726
       13        14        15        16        17        18
1.9027477 2.0589763 1.6981495 0.8080358 1.8076763 1.5352492
       19        20        21        22        23        24
1.0309812 2.2418916 1.1699746 2.0706517 1.3856923 0.8864281
       25        26        27        28        29        30
1.7609745 2.1123498 1.9844758 0.5556238 2.2874814 1.3651213
       31        32        33        34        35        36
1.3000724 0.7980283 1.0265334 1.8243555 1.5869548 0.8619653
       37        38        39        40        41        42
1.9994871 2.3497505 1.6964815 0.6501393 1.9889236 1.7387355
       43        44        45
1.7921090 0.5372767 1.3100799

To obtain estimates of μ we can just exponentiate these values, or alternatively, use the fitted function. The fitted function returns predictions on the scale of the response, i.e., it also inverts the link function.

# fitted returns predictions on scale of original variable

fitted(mod1p)

        1         2         3         4         5         6
5.847290 10.665195 12.601019 4.256798 8.769528 3.093764
        7         8         9        10        11        12
3.800371 4.212068 5.795505 12.587015 10.315290 2.612670
       13        14        15        16        17        18
6.704291 7.837942 5.463827 2.243497 6.096265 4.642482
       19        20        21        22        23        24
2.803816 9.411116 3.221911 7.929990 3.997593 2.426447
       25        26        27        28        29        30
5.818104 8.267645 7.275233 1.743028 9.850098 3.916198
       31        32        33        34        35        36
3.669562 2.221157 2.791373 6.198798 4.888839 2.367809
       37        38        39        40        41        42
7.385267 10.482953 5.454721 1.915808 7.307664 5.690144
       43        44        45
6.002098 1.711340 3.706470

The Poisson regression model as a data-generating mechanism

In lecture 7 we generated a three-dimensional picture of the normal regression model and saw that the normal model had some problems with it. It would be useful to produce the same graph for the two Poisson models that we've considered, a model with a log link and a model with an identity link, and see how they compare. For that we need to make only trivial modifications to the code from lecture 7. The code is shown below with the changes highlighted in blue.

mu <- exp(predict(mod1p, newdata=data.frame(NAP=c(-1.5,0,1,2))))

y1 <- seq(0,20,1)

x1a <- rep(-1.5, length(y1))

x1b <- rep(0, length(y1))

x1c <- rep(1, length(y1))

x1d <- rep(2, length(y1))

z1a <- dpois(y1, mu[1])

z1b <- dpois(y1, mu[2])

z1c <- dpois(y1, mu[3])

z1d <- dpois(y1, mu[4])

x <- c(x1a, x1b, x1c, x1d)

y <- c(y1, y1, y1, y1)

z <- c(z1a, z1b, z1c, z1d)

library(scatterplot3d)

q1 <- scatterplot3d(x, y, z, xlim=c(-1.5, 2), ylim=c(0, 20), type="n", xlab="NAP", zlab="Density", box=FALSE, ylab='Richness')

q1$points3d(x1a, y1, z1a, type='h', cex=.8)

q1$points3d(x1b, y1, z1b, type='h', cex=.8)

q1$points3d(x1c, y1, z1c, type='h', cex=.8)

q1$points3d(x1d, y1, z1d, type='h', cex=.8)

mu2 <- exp(predict(mod1p, newdata=data.frame(NAP=seq(-1.5,2,.1))))

q1$points3d(seq(-1.5,2,.1), mu2, rep(0,length(mu2)), type='l', lwd=2)


Fig. 3 Poisson regression model using a log link

The distributions are plotted as bar graphs rather than curves because the Poisson distribution is only defined at non-negative integer values. Observe how the regression "line" curves around and comes in asymptotic to the NAP-axis. Notice also that when the Poisson mean is far removed from zero, the Poisson distribution is symmetric (e.g., at NAP = –1.5), but as the mean gets closer to zero the Poisson distribution becomes positively skewed.

To produce this graph for the Poisson model with an identity link requires only changing the expressions for mu and mu2 in the above code.

# redo graph for identity link model

# open up new window

windows() # quartz() on Mac

mu <- predict(mod1p1, newdata=data.frame(NAP=c(-1.5,0,1,2)))

y1 <- seq(0,20,1)

x1a <- rep(-1.5, length(y1))

x1b <- rep(0, length(y1))

x1c <- rep(1, length(y1))

x1d <- rep(2, length(y1))

z1a <- dpois(y1, mu[1])

z1b <- dpois(y1, mu[2])

z1c <- dpois(y1, mu[3])

z1d <- dpois(y1, mu[4])

x <- c(x1a, x1b, x1c, x1d)

y <- c(y1, y1, y1, y1)

z <- c(z1a, z1b, z1c, z1d)

library(scatterplot3d)

q1 <- scatterplot3d(x,y,z, xlim=c(-1.5, 2), ylim=c(0, 20), type="n", xlab="NAP", zlab="Density", box=FALSE, ylab='Richness')

q1$points3d(x1a, y1, z1a, type='h', cex=.8)

q1$points3d(x1b, y1, z1b, type='h', cex=.8)

q1$points3d(x1c, y1, z1c, type='h', cex=.8)

q1$points3d(x1d, y1, z1d, type='h', cex=.8)

mu2 <- predict(mod1p1, newdata=data.frame(NAP=seq(-1.5,2,.1)))

#add regression line

q1$points3d(seq(-1.5,2,.1), mu2, rep(0,length(mu2)), type='l', lwd=2)


Fig. 4 Poisson regression model using an identity link

Although the model with an identity link yielded a larger log-likelihood than did the Poisson model with a log link, it's clear in Fig. 4 that extending the regression line to larger NAP values will cause it to eventually cross the x-axis and predict negative values for mean richness. Thus we might prefer the log link model here because it is generalizable to new data.

Poisson regression models

We can also fit Poisson regression model versions of the ordinary regression models we fit to these data previously. The next two models add week as a categorical variable.

# add week to the regression model

mod2p <- glm(richness~NAP+factor(week), data=rikz, family=poisson)

mod3p <- glm(richness~NAP*factor(week), data=rikz, family=poisson)

To test these models we can use the anova function on the interaction model.

anova(mod3p, test='Chisq')

Analysis of Deviance Table

Model: poisson, link: log

Response: richness

Terms added sequentially (first to last)

                 Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL                                44    179.753
NAP               1   66.571        43    113.182 3.376e-16 ***
factor(week)      3   59.716        40     53.466 6.759e-13 ***
NAP:factor(week) 3   22.396        37     31.070 5.396e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The output is interpreted the same way as the ANOVA table for ordinary regression. The NULL model that appears first is one that contains only an intercept. The second line tells us that adding NAP is a significant improvement over the intercept-only model. Line 3 tells us that adding week to the NAP model yields a significant improvement. Line 4 tells us that adding the NAP × week interaction to the NAP + week model yields a significant improvement. Thus our final Poisson regression model should be the interaction model where the effect of NAP varies by week. Using the summary table we can then determine the nature of the interaction just as we did with the normal model.

Comparing the likelihoods of discrete and continuous probability models

We can use the magnitudes of the log-likelihoods to compare the final Poisson model with the final normal model.

# compare log-likelihood of normal and Poisson model

logLik(mod3)

'log Lik.' -99.62243 (df=9)

logLik(mod3p)

'log Lik.' -86.53437 (df=8)

Because the log-likelihood of the Poisson model exceeds that of the normal model, we would like to argue that the Poisson model is preferable—the data are more likely under the Poisson model than under the normal model. This remark needs further justification. Because the normal likelihood is a product of densities (which don't have an immediate probability interpretation) whereas the Poisson likelihood is a product of probability mass functions (which do have a probability interpretation), it's not at all clear that these two likelihoods are comparable.

Because of the central limit theorem and the fact that the normal distribution provides a decent approximation to many probability models, it is perfectly legitimate to use the normal distribution, a model that assumes an underlying continuum, even when a variable is discrete. Even more common is the practice of log-transforming or square-root transforming a discrete response variable and then assuming the result is normally distributed. As a result it would be nice to have a direct way to compare the likelihoods derived from discrete and continuous probability models when applied to the same data set.

At first glance this would not seem to be possible because likelihoods based on continuous probability models are densities while those based on discrete probability models are probabilities. To obtain a probability from a density requires integrating that density over an interval. In particular if X is a continuous random variable, then for all values of k. It is only expressions of the form , for some interval (a, b), that can have nonzero values.

Obviously we do use continuous probability models for discrete random variables, so what does an expression of the form P(X = k) mean when k can only take non-negative integer values? The standard interpretation is the following.

prob discrete vs continuous

and so we have an implicit interval over which we can integrate the density. (Whether the interval is defined to be open, closed, or half-open is immaterial here.) Notice that the interval in question has length = 1 and that the value X = k is the midpoint of this interval. The midpoint rule for approximating an integral tells us that

prob interval

which is just the density. Thus we see that any density used as a model for discrete data can be interpreted as a probability in the approximating sense of the midpoint rule.

Fig. 5 Using a continuous density to approximate a discrete probability (R code)

Fig. 5 displays the predicted normal distribution for observation #4 using the NAP × week interaction model. The observed richness for observation #4 is 11. Because the normal distribution is continuous, calculating P(X = 11) requires evaluating the following integral.

where is the normal density predicted for observation #4. The graph on the left in Fig. 5 displays the midpoint approximation for this integral, the area of a rectangle whose height is the density evaluated at X = 11 and whose width is 1. So, an approximate value of P(X = 11) is just the density evaluated at X = 11. The normal density function in R is dnorm.

The graph on the right in Fig. 5 displays the exact value of P(X = 11), the true area underneath the density curve from x = 10.5 to x = 11.5. The pnorm function in R is the normal cumulative distribution function. It returns the following value.

This is the area under the graph from x = –∞ to x = a. Thus we can find the area under the graph from x = a to x = b by subtracting pnorm(a,μ,σ) from pnorm(b,μ,σ). For our example b = 11.5 and a = 10.5 as is shown at the top of the figure.

To carry out these calculations in R the predict function is used to extract the estimated mean of the normal distribution for observation #4.

#fit normal model

mod3 <- lm(richness~NAP*factor(week), data=rikz)

#calculate mean of fourth observation

mu <- predict(mod3)[4]

#unbiased estimate of std dev

sd <- summary(mod3)$sigma

#MLE of std dev

sd <- sqrt(sum(residuals(mod3)^2)/nrow(rikz))

The estimated mean and standard deviation of the normal distribution can be used to calculate P(X = 11), first using the midpoint approximation and then using the exact formula.

#midpoint approximation

dnorm(11,mu,sd)

[1] 0.1697383

#exact answer

pnorm(11.5,mu,sd) - pnorm(10.5,mu,sd)

[1] 0.1684763

The answers agree to the second decimal place. Next we compare the exact and approximate probabilities for all 45 observations.

y <- pnorm(rikz$richness+.5, predict(mod3), summary(mod3)$sigma) - pnorm(rikz$richness-.5, predict(mod3), summary(mod3)$sigma)

x <- dnorm(rikz$richness, predict(mod3), summary(mod3)$sigma)

plot(y~x, ylab='Exact', xlab='Approximate')

abline(lm(y~x), col='grey70', lwd=2)

abline(0, 1, col=2, lty=2)

fig 5

Fig. 6 Comparing exact and approximate normal probabilities

The approximate and exact probabilities shown in Fig. 6 appear to be exceptionally close. In reality the approximate probability consistently overestimates the true probability by a small amount. If we construct a 95% confidence interval for the slope we see that it doesn't include 1, so we would reject the hypothesis that the two probabilities are equal.

confint(lm(y~x))

2.5 % 97.5 %
(Intercept) 0.0004370469 0.0007912393
x 0.9887148715 0.9912073907

Checking the fit of the Poisson model

The Poisson regression model estimates the mean of a Poisson distribution for each observation. Using it we can then construct the corresponding probability distribution of a Poisson random variable with that mean. To check the fit of the model we can assess whether the observed richness values look as if they could have arisen from their predicted distributions.

We begin by calculating the Poisson probabilities of X = 0 up to and including X = 30 for each observation in the Rikz data set. Because we need to carry out this calculation separately for each observation we use the sapply function.

out <- sapply(0:30, function(x) dpois(x,exp(predict(mod3p))))

dim(out)

[1] 45 31

From the dimensions of the output we can tell that the columns correspond to the different probabilities and the rows correspond to the different observations.

out[1:4,1:8]

[,1] [,2] [,3] [,4] [,5]
1 1.358353e-05 1.522259e-04 8.529715e-04 0.0031863185 0.008926991
2 1.569700e-06 2.097845e-05 1.401846e-04 0.0006245048 0.002086568
3 8.037218e-07 1.127944e-05 7.914792e-05 0.0003702543 0.001299038
4 3.675566e-05 3.753201e-04 1.916238e-03 0.0065223738 0.016650346
[,6] [,7] [,8]
1 0.020008337 0.037371079 0.05982924
2 0.005577241 0.012422957 0.02371831
3 0.003646144 0.008528339 0.01709812
4 0.034004063 0.057870484 0.08441831

We need to unstack this matrix into a single vector so it can be incorporated into a data frame. The as.vector function converts a matrix into a vector by stacking the columns of the matrix one on top of the other. To keep probabilities from individual observations together we first transpose the matrix using the R t function, which swaps rows and columns, before using the as.vector function.

as.vector(t(out))[1:5]

[1] 1.358353e-05 1.522259e-04 8.529715e-04 3.186319e-03 8.926991e-03

In addition to the probabilities we need the following.

A column that labels the probabilities. It consists of the numbers 0:30 repeated 45 times.
An id variable that identifies the site that corresponds to each set of 31 probabilities. It consists of the 45 site numbers repeated 31 times.
The actual recorded richness at that site. Like id it also is repeated 31 times for each site.

my.dat <- data.frame(p=as.vector(t(out)), x=rep(0:30,45), rich=rep(rikz$richness,rep(31,45)), id=rep(1:45,rep(31,45)))

my.dat[1:8,]

             p x rich id
1 1.358353e-05 0   11 1
2 1.522259e-04 1   11 1
3 8.529715e-04 2   11 1
4 3.186319e-03 3   11 1
5 8.926991e-03 4   11 1
6 2.000834e-02 5   11 1
7 3.737108e-02 6   11 1
8 5.982924e-02 7   11 1

Next we construct a panel graph showing the 45 sites in separate panels. Each panel displays the predicted Poisson distribution as a bar plot (using panel.xyplot and type='h' to drop vertical lines to the x-axis) with the observed count superimposed on this distribution to see if it looks typical.

library(lattice)

xyplot(p~x|factor(id), data=my.dat, xlab='Count', ylab= 'Probability', panel=function(x, y, subscripts){

panel.xyplot(x, y, type='h', lwd=2, col='grey70')

panel.points(my.dat$rich[subscripts][1], 0, col=2, pch=8, cex=.8)})


Fig. 7 Panel graph of the predicted Poisson distributions at each site. The observed richness (*) is shown relative to its predicted distribution.

Most of the distributions are compressed because we're displaying more probabilities than are necessary. The smallest probability of an actual observation is 0.003,

min(dpois(rikz$richness, exp(predict(mod3p))))

[1] 0.003184755

so we can remove all probabilities less than 0.001 and redo the plot. The x-axis is made 'free' so that the x-limits in each panel are adjusted according to what is being displayed.

my.dat2 <- my.dat[my.dat$p>.001,]

xyplot(p~x|factor(id), dat=my.dat2, , xlab='Count', ylab= 'Probability', panel=function(x, y, subscripts){

panel.xyplot(x, y, type='h', lwd=2, col='grey70')

panel.points(my.dat2$rich[subscripts][1], 0, col=2, pch=8, cex=.8)

}, scales=list(x='free'))

fig. 7

Fig. 8 A repetition of Fig. 6 in which negligible tail probabilities have been removed.

The only sites in which the observed richness count appears to be an outlier are sites 9 and 42. We can quantify this further by calculating a p-value for the observed richness in each site. The p-value is a tail probability, the probability of observing a value as extreme or more extreme than what was actually observed. We focus on the upper-tail p-values because its clear that there are no lower-tailed outliers. If x_i is the observed richness value in site i then the p-value can be calculated as follows.

dpois

dpois(rikz$richness, exp(predict(mod3p))) + 1-ppois(rikz$richness, exp(predict(mod3p)))

[1] 0.564536595 0.856945440 0.644880896 0.443366343 0.807582741
[6] 0.710012388 0.653212824 0.795528853 0.020409511 0.246635951
[11] 0.466768559 0.721901128 0.456851258 0.775319935 0.529175529
[16] 0.663517086 0.606943095 0.417628894 0.748191310 0.730492661
[21] 0.365635600 0.407623265 0.170288173 1.000000000 0.910574728
[26] 0.753415700 0.801883813 0.724110843 0.762425848 0.339517692
[31] 0.547059256 0.659633303 0.746555060 0.618755905 0.239011476
[36] 0.258738473 0.660451614 0.690243444 0.395297297 1.000000000
[41] 0.322942714 0.004867473 0.831547249 1.000000000 0.765540804

The only p-values less than 0.05 correspond to site 9 (p = 0.02) and site 42 (p = 0.005). With 45 sites we'd expect to obtain .05*45 = 2.25 significant p-values by chance alone, so what we obtained is not an unusual result. We conclude that the observed richness counts could very well have arisen from the proposed model.

Course Home Page

Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum for the Environment and Ecology, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--Jan 31, 2012
URL: https://sakai.unc.edu/access/content/group/2842013b-58f5-4453-aa8d-3e01bacbfc3d/public/Ecol562_Spring2012/docs/lectures/lecture8.htm