Coding for Missing ValuesEdit
There is a considerable literature on the treatment of missing data (see for example Allison (2001) for references) and it is not my intention to deal with the methodological issues here. Rather, I want to explain the missing values dialog in a little more detail. Here is the dialog:
Notice that you can give up to three discrete integer or string values, or a range, to stand for missing data. If you want to analyse the missing values you have in your data you may wish to use more than one indicator (for example, you may have choices missing at random, missing completely at random, and missing not at random), but if you are only interested in eliminating these cases (data points), then a single indicator is enough.
Note that if your variable is numeric, then your code indicating missing data must also be numeric.
Wherever possible, choose a missing value code that is logically impossible as a measured value of this variable. For example, -1 for age. You might consider using 999 as the missing code for age, as it is an improbable age, but it is better to designate the impossible rather than the improbable. For string variables, you can use any string, e.g., NA for not applicable, or plain missing.
When you have completed the missing values dialog in Variable View, you must still input the data, as a variable's default missing value won't be your missing value. It will instead be system missing in the case of numeric variables (indicated by a period), and a space for string variables.
You can do this by hand if you only have a few missing instances. First, sort the data on the variable concerned, so that the empty cells appear first, facilitating the task.
If you have a large number of missing data points, however, it is easier to recode the system missing data using Transform → Recode into the same variable dialog. For numeric variables, on the left hand side of the dialog, choose system missing as the old value and, on the right hand side, your missing code as the new value. For string data, instead of system missing as the old value, you would enter a space.
- Allison, P. D. ((2001).). Missing data.. Thousand Oaks, CA: Sage Publications..