Identifying and Correcting Data Entry Errors in SPSS

Identifying and Correcting Data Entry Errors

This tutorials discusses in detail how research can identify data entry errors in SPSS. The tutorial guides the research on how to screen data entry errors and how to locate the data entry errors in SPSS and how to correct the mistakes.

For Complete Playlist of SPSS Tutorials, Click Here

Data Entry Errors in SPSS

Mistakes are common when data is entered into SPSS. Once the data is entered it is recommended that the variables are checked for any mistakes. However, before entering data into SPSS, if the questionnaires are in print format, each of the questionnaires shall be numbered. This is important so that one can refer to the particular questionnaire when making corrections in the data. After the questionnaire is entered in SPSS, the researcher should check its minimum and maximum value and assess if the data is entered correctly for each of the variables.

Screening and Cleansing the Data in SPSS

In order to get accurate results, it is important that the data is free of errors. Unfortunately, data entry is a boring and cumbersome process, this could result in errors. For instance, while entering the Salary of a person the researcher may enter 4000, instead of 40000, this would seriously damage the results when the study requires examining the average income of the population. This makes screening and cleaning of data an important part of the research.

 

The data screening process involves a number of steps:

Step 1: Checking for errors. First, you need to check each of your variables for scores that are out of range (i.e. not within the range of possible scores).

Step 2: Finding and correcting the error in the data file. Second, you need to find wherein the data file this error occurred (i.e. which case is involved) and correct or delete the value.

Step 1: Checking for Errors

When looking for errors in the data, the researcher should look any values that are outside the range of possible values for a variable. For example, if gender is coded 1=male, 2=female, you should not find any scores other than 1 or 2 for this variable. Scores that fall outside the possible range can distort your statistical analyses. To check for errors, you will need to inspect the frequencies for each of your variables. This includes all of the individual items that make up the scales. Errors must be corrected before total scores for these scales are calculated.

There are a number of different ways to check for errors using SPSS. I will illustrate two different ways, one that is more suitable for categorical variables (e.g. Gender, Occupation) and the other for continuous variables (e.g. age).

Checking categorical variables for data entry mistakes

Here we look at the steps to check categorical variables for errors.

Process

  • From the main menu at the top of the screen, click on Analyze, and then click on Descriptive Statistics, then Frequencies (See Figure).
Access Frequency Menu in SPSS
  • Choose the variables that you wish to check for errors (e.g. Gender, Job Rank, and Occupation).
  • Click on the arrow button to move these into the Variable
  • Click on the Statistics Check Minimum and Maximum in the Dispersion section (See Figure).

Minimum and Maximum Option in Frequency Menu

  • Click on Continue and then on OK
  • Results are displayed in a new window referred to as the Output window.

Frequency Output

There are two parts to the output. The first table (Statistics) provides a summary of each of the variables you requested. The remaining tables give you a breakdown, for each variable, of the range of responses.

Checking for Errors

  • In order to check for the errors, start by looking at the first table of Statistics, look at all the Minimum and Maximum value for all the three variables. For instance check the first variable CSR1, the possible values for this variable can only be 1 to 5, but in this case the value is 55. Thus there is an error in data entry.
  • Check the number of Valid and Missing If there are a lot of missing cases, you need to ask why. Have you made errors in entering the data (e.g. put the data in the wrong columns)?
  • Other tables are also presented in the output, corresponding to each of the variables that were investigated. In these tables, you can see how many cases fell into each of the legitimate categories. It also shows how many cases have out-of-range values. There is 1 case with a value of 55 for CSR1, having out of range value.

Checking for Continuous variable

The process for checking errors in the continuous variable is described below 

  • From the menu at the top of the screen, click on Analyze, then click on Descriptive statistics, then Descriptives.
  • Click on the variables that you wish to check. Click on the arrow button to move them into the Variables box (e.g. age).
  • Click on the Options You can ask for a range of statistics. The main ones at this stage are mean, standard deviation, minimum and maximum. Click on the statistics you wish to generate.
  • Click on Continue, and then on OK.

The output generated from this procedure is shown as follows.

Output Descriptive Statistics
  • The Minimum value for Age suggests that there was an error in recording the data. As if the data was collected from working professionals.
  • Does the Mean score make sense? If there is an out-of-range value in the data file, this will distort the mean value.

Step 2: Finding and Correcting the Errors in SPSS

Now since we found errors in our data file, (e.g. a value of 55 for CSR1)? Two methods are illustrated here in order to purify the dataset. In this case numbering the questionnaire that i tipped earlier would be off great use, since you can go back to the questionnaire and look for the correct data value.

Method 1

  • Click on the Data menu and choose Sort Cases.
  • In the dialogue box that pops up, click on the variable that you know has an error (e.g. sex) and then on the arrow to move it into the Sort By Click on either ascending or descending (depending on whether you want the higher values at the top or the bottom). For CSR1, we want to find the person with the value of 55, so we would choose descending.
  • Click on OK.

Method 2

  • Make sure that the Data Editor window is open and on the screen with the data showing.
  • Click on the variable name in which the error has occurred (e.g. Job Rank).
  • Click once to highlight the column.
  • Click on Edit from the menu across the top of the screen. Click on Find.
  • In the Find box, type in the incorrect value that you are looking for (e.g. 5 or 6).
  • Click on Find Next. SPSS will scan through the file and will stop at the first occurrence of the value that you specified. Take note of the row number of this case. You will need this to check your records or questionnaires to find out what the value should be.
  • Click on Find Next again if you need to continue searching for other cases with the same incorrect value. In this example, we know from the Frequencies output that there is only one incorrect value of 5 and one incorrect value of 6.
  • Click on Close when you have finished searching.

After you have corrected your errors, it is essential to repeat Frequencies to double check. Sometimes, in correcting one error you may have accidentally caused another error.

Video Tutorial: How to Locate and Correct Data Entry Error

Additional SPSS Tutorials