Final Exam—Part 2

Due Date

Monday, December 10, 2012

Instructions

Same rules and policies apply to this "exam" as apply to ordinary assignments.

Data Source

The data for this assignment are contained in moths.csv, a comma-delimited text file. They appear as Table 12 in Bishop (1971).

Background

The phenomenon of industrial melanism is one of the textbook examples of natural selection in action. The relative prevalence of two natural color morphs of the moth Biston betularia in England has changed over time apparently in response to industrial air pollution. The dark morph flourishes in polluted areas where tree trunks are darkened by soot while the light morph flourishes in less polluted areas where tree trunks are lighter. Bishop (1971) investigated a naturally occurring cline of Biston betularia extending from industrial Liverpool (most polluted) to the rural countryside of North Wales (least polluted). One of the analyses he carried out is the subject of this exam question.

Bishop selected seven woodlands (Location) at varying distances (Distance) from Liverpool (in km). He describes his experimental protocol as follows, p. 224–225.

In June and early July 1966 and 1967, eight trees (occasionally sixteen) were selected at random at each of several localities every day. Equal numbers of frozen typical and carbonaria moths were glued to these in life-like positions. It was assumed that these moths would be subject to predation by birds in the manner observed by Kettlewell (1955). The moths were placed on a different aspect of the tree-trunks each day at heights of from 0.5 to 2. m. The position of each moth was noted and after 24 hrs a record was made as to whether or not it had been removed (or preyed upon). Remaining moths were then detached and the process was repeated with fresh moths on a different random series of trees. The long duration of the experiment meant that predation occurred over a range of weather conditions.

Unfortunately the raw data are unavailable. Instead we have the results summarized by site for each Morph. The variable Num_moths records how many total moths of a particular Morph were exposed to predation at a particular site and the variable Num_removed records the number of these individuals that was removed presumably by predators. The question of interest is whether the predation rate varies with the distance from Liverpool and, more importantly, whether this relationship is different for the two color morphs.

Questions

  1. A single regression model involving the predictors Distance and Morph can answer the researcher's question of interest. Write down that regression model in generic form. What I want here is the expression that would appear on right side of a regression equation written out as a sum of parameters times predictors.

Hint 1: I'm looking for an expression of the form β0 + β1x + …, where you should include as many terms as are needed to describe the basic outlines of the experiment and answer the researcher's question. Be sure to identify what the variables in your expression represent.
Hint 2: I'm not asking you to include the complications discussed in Question 6 at this point.

  1. Using the expression you've written as your answer to Question 1, state a null hypothesis in terms of model parameters that directly tests whether the relationship between predation rate and distance is the same for the two morphs.
  2. Given the nature of the response variable, fit an appropriate regression model that addresses the researcher's primary question.
  3. Test the overall fit of the model of Question 3 using an appropriate goodness of fit test. Verify that the test is appropriate.
  4. There's a structural characteristic of these data that we've been ignoring that may be acting to make the data heterogeneous. The structure is represented by a variable in the data set. What am I talking about?
  5. Refit your model from Question 3 but this time also account for the structure of the data. In reference to this structure, which variable in your model is a level-1 variable and which variable is a level-2 variable?
  6. The statistical evidence for this structure turns out to be very weak. Demonstrate this either by carrying out a formal statistical test or by citing relevant statistics. (Note: We'll continue to use the model with structure in the remaining questions anyway.)

Hint 1: Unfortunately, the log-likelihood reported by the lmer function for non-normal models is not comparable to the log-likelihood reported by the glm function. But it is the case that the deviances reported by lmer and glm are comparable. So, to compare these two models you can use their deviances. You can use the deviance function to extract it.
Hint 2: If you elect to carry out a statistical test using the deviances you need to be aware that in the hypothesis test you're carrying out, H0: τ2 = 0, zero is a boundary value for τ2. The usual distribution of the likelihood ratio statistic is incorrect for boundary values. The p-value adjustment that we used for testing H0: θ = 0 in a negative binomial model (lecture 18) is the same adjustment you need to carry out here.
Hint 3: If you prefer to use AIC to compare the models then you'll need to compute the correct log-likelihood of the lmer model. This is not hard to do using the definition of the deviance. Here is an outline of the necessary steps.

  1. Use the glm model to find the log-likelihood of the saturated model.
    1. You can do this by using the reported deviance and log-likelihood of the glm model along with the equation that defines the deviance. Just solve for the log-likelihood of the saturated model in this equation.
    2. Alternatively use glm to fit the saturated model and extract the log-likelihood from the saturated model.
  2. Having obtained the log-likelihood of the saturated model and knowing the deviance of the lmer model, calculate the log-likelihood of the lmer model using the equation that defines the deviance.
  3. Use this log-likelihood in the formula for AIC to calculate the AIC of the lmer model.
  1. Using the model from question 6, compute a statistic that compares the odds of being removed 50 km away from Liverpool to the odds of being removed in Liverpool (0 km away). Calculate this statistic separately for the dark and light morphs and interpret your results.
  2. Using the model from question 6, produce a graph that summarizes the results of the analysis as follows.
    1. Plot the empirical proportions (the observed proportions of moths eaten) as a function of distance. Distinguish the plotted values by their Morph type.
    2. Plot the predicted probability of being eaten as a function of distance using only the fixed effect estimates from your model. Display these as curves superimposed on your scatter plot of empirical probabilities.
    3. Plot the predicted probability of being eaten as a function of distance using both the fixed effect estimates and the random effect predictions. Plot these as points being sure to distinguish them from the points you plotted in (a).
    4. Label your diagram appropriately using a coherent set of colors and symbol types.
  3. It was mentioned in the background section above that the data we are using are tabulated versions of the raw data from each location for each morph. The actual observations are the number of moths removed out of a fixed total on separate trees at a given location. The results for individual trees were then combined to yield the data we used for this analysis. Which of the four basic assumptions of the probability model that we've been using is the one that is most likely to be (but not necessarily) violated by combining data in this fashion?
  4. Alternate analysis 1. Offer an argument that treating the proportion of moths removed (Num_removed/Num_moths) as being normally distributed is not an unreasonable assumption for these data. (Carry out a calculation that justifies using the normal distribution to approximate the binomial distribution for these data.) Redo the analysis of Question 6 but this time analyze the proportions rather than the counts and assume that the proportions have a normal distribution. What do you conclude from the analysis?
  5. Alternate analysis 2. At each distance calculate the difference in the proportion of moths removed, dark morph proportion minus light morph proportion. Regress these differences in proportions against distance from Liverpool. Compare your results to what you obtained in Question 11. (Look especially at the summary tables.) What do you conclude?

Cited references

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--December 3, 2012
URL: https://sakai.unc.edu/access/content/group/3d1eb92e-7848-4f55-90c3-7c72a54e7e43/public/docs/assignments/finalpart2.htm