Data Analysis: Working with SPSS

Data Analysis: Working with SPSS

What Kind of Data

Organizing the Data

Loading a Text File

Defining the Variables

Analysis of Selected Data

Descriptive Statistics

Means for Between-Group Conditions

ANOVA: Simple One-Way

Between-Subjects Factorial ANOVA

Repeated Measures ANOVA

Discussion

Mixed Design ANOVA

 

Advanced Topics

 

Chi Square Contingency Test

Analysis of Covariance (ANCOVA)

Multivariate Analysis of Variance

Log-Linear Models

 

"Dynamism of a Soccer Player" (Umberto Boccioni) scanned by Mark Harden, at Artchive.

What Kind of Data?

There are generally two kinds of experimental data, and they require quite different kinds of analysis.

In most cases your dependent variable will be a measurement of some sort: a rating, a test score, an error frequency, a reaction time, etc. In all of these case you will use ANOVA for the data analysis. You will need to specify what your variables (or factors) are, and whether they are manipulated between-groups or within-subjects.

Most of this tutorial is devoted to ANOVA and related topics.

Sometimes your data will be counts of independent events. For example, you may have two groups of subjects, and count the number of subjects who meet some criterion. This is a rather inefficient form of data - you gather only one nominal value per person - but it is sometimes inevitable. With these data you typically use chi square for data analysis rather than ANOVA.

Note that chi square requires that each data point be independent of every other. For example, you cannnot use it to analyze multiple data obtained from one person, since the events would not be independent of each other.

Organizing the Data

For ANOVA, think of your data as a matrix, with each row being a subject and each column being a characteristic of the subject - an identifier, group membership, an observation, etc.

Generally, all data should be numerical, even for nominal level measurement. E.g., use 1=female, 2=male, for gender. You can create identifying labels later.

It's helpful to keep to a consistent order of the columns in all of your data files: I suggest,

Subject ID, grouping variables, dependent (measured) variables.

For chi square, the data need to be entered as frequencies for a contingency matrix - see below.

Loading a Text File

Data in the form of a text file are easy to import into SPSS, especially if you use "tab delimited" format: Each number is separated by a tab character. BCR generates data in this form as aggregate data files.

Open SPSS and choose "Open an existing data source". Click on "OK", choose "Files of type" = "Text", and navigate to the file you want. Open the file, and let the import wizard do its thing - keep clicking on "Next" until you get to click on "Finish". The data should appear in front of you, in the recommended format of one row per subject.

If your data have been collected on paper, it may be easier to enter data directly into SPSS without creating the text file first. Just type in the data matrix.

Defining the Variables

Click on "Variable View" to define the variables. Here each row is one variable. First you might want to delete any variables that will not be relevant for any of your analyses. To delete a variable, click on the left-most cell in the irrelevant row and press [Delete].

To name a variable, click on the relevant cell in the "Name" column and type in a new name. Naming the variables is very helpful for further analyses.

For nominal level variables (e.g., gender) you may also want to define the values. Click on the relevant cell in the "Values" column, then click on the "…" button. In the resulting dialog box, type a number in the "Value" box and a label in the "Value Label" box. For example, enter "1" and "female". Click on "Add", and repeat for all other values.

Click on "Data View" to return to the data. At this point you should save what you have created as an SPSS "sav" file. You can reload the file later when you need to.

Analysis of Selected Data

You may from time to time want to analyze the data for only a subset of your subjects (e.g., for males only). To do this, choose the menu options "Data", "Select Cases…". Select a variable from the box on the left, then click on the "If condition is satisfied" radio button. Click on "If" to define the selection rule.

Use the dialog box to create the rule. You can select a variable (e.g., "gender") from the list on the left and click the right arrow button to enter it into the rule. Use the other buttons to define the rule, or just type it in directly, e.g., "gender=1". Then click on "Continue" and "OK". Unselected rows will be marked with a "\" symbol.

To restore all of the data, choose the menu options "Data", "Select Cases…", click on "All cases", and click "OK".

There's a quicker way to select cases. On the SPSS tool bar there is a speed button that takes you directly to the selection dialog box.

Descriptive Statistics

To obtain basic descriptive statistics for variables, choose the menu options "Analyze", "Descriptive Statistics", "Descriptives…". Select variables from the box on the left and click on the right arrow button to enter them in the "Variable(s)" box. Notice that the right arrow button now becomes a left arrow button. If you have made an error you can put the variable back where it came from. You can click on "Options" to select the statistics you want. Then click on "OK", and Voila!

Correlations are calculated as easily. From the menu choose options "Analyze", "Correlate", "Bivariate…". Select variables in the box on the left and click on the right arrow button to enter them in the "Variable(s)" box. Choose the kind of correlation you want (probably Pearson or Spearman), and click on "OK". Note that the resulting matrix of correlations is symmetrical - the upper right part duplicates the lower left.

Means for Between-Group Conditions

The procedure described above works if you want to obtain the means for variables across all of the subjects. However, if you have one or more between-groups factors you will want to obtain means separately for every condition. Here's how to do it.

First choose the menu options "Analyze", "Compare Means", "Means".

To simplify the output, click on "Options". You see a list of statistics to be reported. The defaults are Mean, Number of cases, and Standard deviation. Select each of the latter two and click on the left pointing arrow. This leaves the Mean as the only statistic. Click on "Continue". (You might want to keep Number of cases if you think they might be unequal)

In the main dialog choose the columns that represent the dependent variables you would like means for, and click on the right arrow next to "Dependent list".

Choose the first of the between-group factors and click on the right arrow next to "Independent list" in the section labelled "Layer". If you only have one between-group factor, that's all you need to do. Just click on "OK" to see the results.

If you have more tha one between-group factor, the next step is not obvious. Click on "Next". This moves you to the next "layer". Notice that the box is now labelled "Layer 2 of 2". Now click on the second between-group factor and click on the right arrow next to "Independent list". You have created two "layers" for the analysis of means. Now click "OK" and you will see means for all of the factor combinations.

ANOVA: Simple One-Way

Use one-way ANOVA for a comparison of independent groups that vary on a single factor. For example, you might have three groups, and no repeated measures variables of any interest.

Choose the menu options "Analyze", "Compare Means", "One-Way ANOVA…".

You can perform the ANOVA on more than one dependent variable if you wish. Select dependent variables from the box on the left and click on the upper right arrow button to enter them in the "Dependent List" box.

Select the grouping variable (the between-subjects variable) from the box on the left and click on the lower right arrow button to enter it into the "Factor" box. Click on "OK". You should have no trouble interpreting the results.

Report the between-groups mean square and degrees of freedom, the within-groups mean square and degrees of freedom (the error term), and the F ratio. It's best to put all of this in a table.

Between-Subjects Factorial ANOVA

The one-way ANOVA procedure only works when you have a single between-subjects factor. With a factorial design, you must use the General Linear Model, a widely used procedure for analysis that has many different uses.

Choose the menu options "Analyze", "General Linear Model", "Univariate…". Select a dependent variable from the box on the left and click on the upper right arrow button to enter it in the "Dependent List" box. (If you want to use more than one dependent variable, you should use the multivariate procedure - see below)

The independent variables should be entered into the "Fixed Factor(s)" box.

Click on "OK" and examine the results.

In the "Between-Subjects Effects" table, ignore the "Corrected model" and "Intercept" tests. The former is a test of the hypothesis that all effects (main effects and interactions combined) are zero. The latter is a test of the hypothesis that the overall mean for the dependent variable is zero. Usually, neither of these tests is of any interest.

In a table report results (mean square, degrees of freedom, and F ratio) for the main effects and interactions, and mean square, degrees of freedom for the error term.

Repeated Measures ANOVA

Suppose that all of your independent variables are manipulated within-subjects. For each subject (row) there will be two or more entries that represent the dependent variable, and the independent variable levels will correspond to different columns. If it is a one factor design, the columns will represent levels of a single factor. If it is a factorial design, the columns will represent all possible combinations of the factor levels.

Choose the menu options "Analyze", "General Linear Model", "Repeated Measures…". You first need to define the relationship between the column variables and the levels of your independent variables.

In the dialog box, enter a name for the first factor (e.g., "task" or "treatment"). Enter the number of levels for this factor, and click on "Add". If it's a one factor design you can move on. If it's a multi-factor design, repeat the process for the other factors. Give each one a name, and indicate the number of levels.

The factors box will contain entries such as "task(2)" or "treatment(3)". Now click on "Define" to specify where these factors may be found. The next dialog shows the usual list of column headings (variables) on the left, and the "Within-Subjects Variables" box contains entries such as "__?__ (1,1)", etc. The blank spaces need to be filled in.

If you have a one factor design the Variables box will contain entries "__?__(1)", "__?__(2)", etc. These are the levels of the one factor. Select a column heading from the box on the left and click the upper right arrow. The first entry will be filled in. Repeat for other levels of this variable.

If you have a two factor design the Variables box will contain entries "__?__(1,1)", "__?__(1,2)", etc. These are the combined levels of the two factors. Note the title at the top of the dialog box. It will say something like "Within Subject variables (X, Y)", where X and Y are your variable names. Notice the order of X and Y carefully.

Select a column heading from the box on the left that corresponds to the first combination. Click the upper right arrow. The first entry will be filled in.

Repeat for other combinations of the two variables, but be very careful. Watch the ordering of the two variables. Your second entry will be the second level for the second factor, with the first level of the first factor (1,2). Your third entry will be the second level for the first factor, with the first level of the second factor (2,1).

For a pure repeated measures design, that's all you will need. Click on OK.

You may skip most of the output. Go to the "Tests of Within-Subjects Effects". Here will be the tests for your main effects and interactions. To keep things simple, use the "Sphericity assumed" values.

Note that each main effect and each interaction has its own error term. In your tables of results include all of these.

Mixed Designs ANOVA

If your design involves a mix of within-subjects (repeated measures) factors and between subjects( or between groups) factors, you use the GLM analysis that was used for repeated measures, with just a couple of extensions.

Choose the menu options "Analyze", "General Linear Model", "Repeated Measures…". Define the relationship between the column variables and the levels of your within-subjects variables just as you did before.

When the definition of within-subjects factors is complete, and before clicking on "OK" to perform the analysis, select from the list of variables your between-subjects factor or factors. Click on the middle right arrow button to enter the name in the "Between-Subjects Factor(s)" box. The click on OK.

In the output window you should go to the Tests of Within-Subjects Effects. The tests will include interactions of within-subjects and between-subjects factors.

The last table contains tests of the between-subjects factors themselves.

Advanced Topics

Chi Square Contingency Test

Use a chi square test to find out if there is a contingency (correlation) between two nominal level classification variables (A and B), when the data consist of frequencies of independent events.

The data should be entered so that each row is one cell of the contingency table. One column should be the level for variable A, another should be the level for variable B, and a third column should contain the cell frequencies.

You can use the Variable View to define the category labels for the A and B variables if you wish.

The frequencies will be defined as "weights" within SPSS. Choose menu options "Data". "Weight Cases…". In the dialog box select the heading of the column that contains the frequencies. Select the "Weight case by…" radio button, and click on the right arrow button, so that the appropriate variable appears in the "Frequency Variable" box. Click on "OK".

There's a quicker way to define weights. On the SPSS tool bar there is a speed button that takes you directly to the relevant dialog box.

To perform the analysis, use what SPSS refers to as the "Crosstabs" procedure, which can be hard to find. From the menu choose options " Analyze", "Descriptive Statistics", "Crosstabs …". Click on the "Statistics…" button and select "Chi square". Click on "Continue…". This sets up the chi square analysis.

Back in the Crosstabs dialog select the row variable and click on the upper right arrow button to enter it into the "Row(s)" box, and select the column variable and click on the middle right arrow button to enter it into the "Columns(s)" box. Click on "OK".

Look at the "Chi Square Tests" section of the output tables. A number of different statistics will be reported. The Pearson chi square is usually reported, using the Yates correction for continuity in a 2 by 2 contingency table.

Analysis of Covariance (ANCOVA)

Analysis of covariance is an extension of ANOVA in which one or more covariates are introduced into the design. The usual purpose of ANCOVA is to find out if the results of an ANOVA are changed when variance in the dependent variable due to the covariates has been removed. That is, it is a way to control statistically for extraneous variables.

If you have no within-subjects (repetaed measures) variables, use the GLM command and choose Univariate. After you identify the between-groups variables (the fixed factors), you can select one or more variables and add them to the "Covariates" box. Do this in the usual way: select a variable and click on the lower right arrow button.

If you do have within-subjects variables, use the repeated measures version of GLM. In the step where the within-subjects factors are defined, and between-subjects factors have been introduced, you can select the covariates and add them to the "Covariates" box.

In the table of results, the covariate will be treated as a between-subjects factor, since that is, in effect, what it is. It is a continuous variable rather than a classification variable. Thus, the output can be interpreted in the usual way.

Multivariate Analysis of Variance (MANOVA)

As noted above, an analysis of variance can be carried out on more than one dependent variable at once. However, each ANOVA then is treated independently, and in many cases it is more parsimonious to consider the whole set of dependent variables as a single multi-variate measure.

Again the GLM procedure is used, but from the menu choose "Analyze", "General Linear Model", "Multivariate…". Select all of the dependent variables. The box labeled "Fixed Factors" is used for between-subject classification variables. You can add in covariates if you wish. Click on "OK".

In the output you can examine the tests of the between-subjects factors. These are tests to find out if the groups differ at all in the multidimensional space defined by the dependent variables.

Log-Linear Models

Log linear models are an important extension of chi square tests of contingency. The routine chi square is a test of contingencies between two classification variables, but sometimes there are three or more classification variables that need to be entered into the analysis. The log linear analysis is conceptually very similar to an ANOVA. You can look at main effects for each classification variable and at any interactions among them. The data are assumed to be frequencies of independent events.

The data should be entered in the same way as for a chi square test. Each row represents one combination of the variables. One column is used for each variable, and should contain levels for the variable. Define category labels for the variables if you wish. A final column should contain the cell frequencies.

As in the case of chi square, you must create a weighting variable that uses the cell frequencies.Choose menu options "Data". "Weight Cases…". In the dialog box select the heading of the column that contains the frequencies. Select the "Weight case by…" radio button, and click on the right arrow button, so that the appropriate variable appears in the "Frequency Variable" box. Click on "OK".

For the data analysis, choose "Analyze" and "Loglinear", and then "Model selection…". In the model selection box you enter the classification variables into the "Factors" box. As you enter each variable you must define its range. Click on the "Define Range" button, and enter the minimum and maximum values for the variable.

Ignore the "Cell Weights" box - this is not the way to specify frequencies.

Leave the settings unchanged, but click on the "Options" button. Deselect "Frequencies" and "Residuals"; these will just clutter the output. Click on "Continue", and click "OK" on the model dialog.

The output is complex, but it will conclude with the best fitting model, which will include factors and their interactions that appear to be necessary in accounting for the observed frequencies. The goodness of fit of this model is best assessed by examining the likelihood ratio chi square.