Final Exam—Part 1

Due Date

Friday, April 13, 2012

Instructions

Same rules and policies apply to this "exam" as apply to ordinary assignments. Part 2 of the final will be assigned around the last day of class and will be due on the day of the scheduled final exam.

Data Source

The file lakes.txt contains the data shown in Table 1.1, pp 8–9 of Manly (2001), sulfate (SO4) concentrations over several years for a number of lakes in Norway. It is a tab-delimited text file. Missing values are blank.

Overview

Manly (2001), p. 7, describes the data as follows.

A Norwegian research programme was started in 1972 in response to widespread concern in Scandinavian countries about the effects of acid precipitation (Overrein et al., 1980). As part of this study, regional surveys of small lakes were carried out in 1974 to 1978, with some extra sampling done in 1981. Data were recorded for pH, sulphate (SO4) concentration, nitrate (NO3) concentration, and calcium (Ca) at each sampled lake.

Only the SO4 data are contained in the file lakes.txt. The columns labeled 1976, 1977, 1978, and 1981 contain the recorded sulfate concentrations in those years for the various lakes.

Questions

  1. In this first part ignore the structure of the data set and find the best model you can that relates SO4 concentration to year. List all models you considered and provide a statistical argument(s) for why your best model took the form it did.
  2. Produce a graph that accurately displays the structure of the data set. Your graphical display should include for each lake both a scatter plot of the data and a superimposed regression line (curve).
  3. Find the best model you can that relates SO4 concentration to year but this time also correctly accounts for the structure in the data set.
  4. Interpret the best model you found in part 3. What does the model actually say? Be as specific as possible. Express your answer in terms of concentration units.
  5. Produce suitable graph(s) to demonstrate that in addition to being structurally invalid your best model in Question 1 failed to adequately account for the spatial correlation in the data set.
  6. Produce suitable graph(s) to demonstrate that your best model in Question 3 has probably accounted for the spatial correlation in the data set.
  7. Use formal statistical tests to check your answer to Question 6.
  8. If your answer to Question 7 indicates there is a problem try to "fix" your model to account for it.

Hints

Question 1

There are two parts to this question.

  1. Find an appropriate probability model for the response.
  2. Determine the best way to include the predictor in the model.

There are three common probability distributions that you might consider here. To get full credit you must consider all three. No credit will be given for using probability models that can be dismissed as obviously bad choices without even looking at the data. The three probability models I'm thinking of could be reasonable choices for modeling SO4 concentration at least for some data sets, if not for this one.

According to the assumptions of simple linear regression, a probability model must hold separately at each level of the predictor, not for the data set as a whole. You might examine histograms or kernel densities of the distribution of the response by year to suggest possible distributions. Also keep in mind that it is possible to obtain a different probability distribution for the response in regression without changing the probability distribution of the errors.

There two and perhaps three reasonable ways to include the predictor year in the model. Together then you should consider at least (at most) nine different models relating sulfate concentration to year.

Questions 2 & 3

There are two forms of structure to these data—a deliberate structure that is imposed by the way the data were collected and a second structure that I would call incidental whose importance will need to be investigated. The problem sets out to account for the deliberate structure directly and then at the very end checks to see if the incidental structure has also been inadvertently accounted for. To understand what the deliberate structure here is ask yourself the following question. Was a random sample of lakes obtained at each time period? Since it appears that a random sample was only taken at the beginning of the study what's the structure?

Question 3

It is not necessary to redo Question 1 here. You may start with your best model from Question 1 and incorporate the data set's structure.

Questions 5 & 6

I'm not asking you to fit any formal models in parts 5 and 6. The empirical estimate should be enough to answer the question. You will need to answer this question separately by year. Also keep in mind that the residuals from your regression models will have the missing observations removed. You will need to account for this when you match up the residuals with their geographic coordinates.

Cited references

Course Home Page


Jack Weiss
Phone: (919) 962-5930
E-Mail: jack_weiss@unc.edu
Address: Curriculum in Ecology and the Environment, Box 3275, University of North Carolina, Chapel Hill, 27599
Copyright © 2012
Last Revised--March 31, 2012
URL: https://sakai.unc.edu/access/content/group/2842013b-58f5-4453-aa8d-3e01bacbfc3d/public/Ecol562_Spring2012/docs/assignments/finalpart1.htm