Binary Logistic Regression using SPSS

What is it All About?

Logistic regression analysis is a method to determine the reason-result relationship of independent variable(s) with dependent variable

Binary Logistic Regression using SPSS

Learn to Analyze Data using Binary Logistic Regression using SPSS

The tutorial is a step by step guide on how to perform Binary Logistic Regression using SPSS.

Do Like and Share the Video

The Concept of Binary Logistic Regression

"What to do when we have a binary/Dichotomous outcome variable in Research"

Regression technique is used to assess the strength of a relationship between one dependent and independent variable(s). It helps in predicting value of a dependent variable from one or more independent variable. Regression analysis helps in predicting how much variance is being accounted in a single response (dependent variable) by a set of independent variables.

Linear regression analysis requires the outcome/criterion variable to be measured as a continuous variable. (To Learn More About Regression Analysis, Click Here). However, there may be situations when the researcher would like to predict an outcome that is dichotomous/binary.

In such situation, a scholar can use Binary Logistic Regression to assess the impact of one of more predictor variables on the outcomes. Logistic regression analysis is a method to determine the reason-result relationship of independent variable(s) with dependent variable

The logistic regression predicts group membership

  • Since logistic regression calculates the probability of success over the probability of failure, the results of the analysis are in the form of an odds ratio.
  • Logistic regression determines the impact of multiple independent variables presented simultaneously to predict membership of one or other of the two dependent variable categories.
  • In logistic regression, the expected outcome is represented by 1 while the other is coded as 0.

Examples of Binary Logistic Regression

A scholar may utilize binary logistic regression in following situations

  • A store would like to assess factors that lead to return/no return of the customer.
  • A college would like to assess admission (admit/Do not admit) of a student based on Age, Grade, Aptitude Test Results.
  • Assess if a particular candidate wins/loses an election based on the time spent in constituency, previously elected, no. of issues resolved.
  • A HR researcher would like to ascertain how factors like experience, years of education, previous salary, university ranking affect the selection of a candidate in a job interview.
  • A scholar would like to predict the choice of bank (Public or Private) based on independent variables that include Technology, Interest Rates, Value Added Services, Perceived Risks, Reputation, and others.

Assumptions

  • Logistic regression does not assume a linear relationship between the dependent and independent variables.
  • The independent variables need not be interval, nor normally distributed, nor linearly related, nor of equal variance within each group
  • Homoscedasticity is not required. The error terms (residuals) do not need to be normally distributed.
  • The dependent variable in logistic regression is not measured on an interval or ratio scale. The dependent variable must be a dichotomous ( 2 categories) for the binary logistic regression.
  • The categories (groups) as a dependent variable must be mutually exclusive and exhaustive; a case can only be in one group and every case must be a member of one of the groups.
  • Larger samples are needed than for linear regression because maximum coefficients using a ML method are large sample estimates. A minimum of 50 cases per predictor is recommended (Field, 2013)
  • Hosmer, Lemeshow, and Sturdivant (2013) suggest a minimum sample of 10 observations per independent variable in the model, but caution that 20 observations per variable should be sought if possible.
  • Leblanc and Fitzgerald (2000) suggest a minimum of 30 observations per independent variable.

Example Problem

For the purpose of this tutorial, i am considering the following example. The IVs are on interval scale while the DV is binary (Public or Private Bank)

  • A scholar would like to predict the choice of bank (Public or Private) based on independent variables that include Technology, Interest Rates, Value Added Services, Perceived Risk, Reputation, Attractiveness, and Perceived Costs.
The scholar is interest in predicting the odds of selection of Public or Private bank on the perception of respondents in relation to the factors like Technology, Interest Rates, Value Added Services, Perceived Risk, Reputation, Attractiveness, and Perceived Costs.
For instance, the scholar hypothesized, that improved technology, offering better interest rates, improved value added services and other predictors will lead to choosing Private banks for account opening.

How to Run Binary Logistic Regression

Step 1: In SPSS, Go to Analyze -> Regression -> Binary Logistic

Step 2: Next, The Logistic Regression Dialog Box will Appear

Step 3: Add Preferred Choice of Bank [Choice] in the Dependent Box and Add IVs, Technology, Interest Rates, Value Added Services, Perceived Risk, Reputation, Attractiveness, and Perceived Costs in the Covariates list box.The Dialog box should now look like

Step 4: Next, Select Options, Check, Hosmer-Lemeshow goodness-of-fit and CI for exp(B)

Step 5: Press, Continue, and then Press OK.

Interpreting Binary Logistic Regression

Case Processing Summary and Encoding

The first section of the output shows Case Processing Summary highlighting the cases included in the analysis. In this example we have a total of 341 respondents.

The Dependent variable encoding table shows the coding for the criterion variable, in this case those who will encourage are classified as 1 while those who will not encourage to take up the Islamic Banking are classified as 0.

Block 0

The next section of the output, headed Block 0, is the results of the analysis without any of our independent variables used in the model. This will serve as a baseline later for comparing the model with our predictor variables included.

Block 1: Method = Enter

Goodness-of-fit statistics help you to determine whether the model adequately describes the data.

Omnibus Test of Model Coefficients

Omnibus Tests of Model Coefficients is used to test the model fit. If the Model is significant, this shows that there is a significant improvement in fit as compared to the null model, hence, the model is showing a good fit.

Hosmer and Lemeshow Test

The Hosmer and Lemeshow test is also a test of Model fit. The Hosmer-Lemeshow statistic indicates a poor fit if the significance value is less than 0.05. Here, the model adequately fits the data. Hence, there is no difference between the observed and predicted model.
Contingency Table for for Hosmer and Lemeshow Test
The model adequately fits the data. As we can see, there is no difference between the observed and predicted model. Both the values are approximately equal.

Model Summary

  • Model Summary shows the Psuedo R-Square. Psuedo means that it is not technically explaining the variation. But they can be used as approximate variation in the criterion variable.
  • Normally used is Nagelkerke’s R2, this is an adjusted version of the Cox & Snell Rsquare that adjusts the scale of the statistic to cover the full range from 0 to 1.
  • In this case we can say that 70.7% change in the criterion variable can be accounted to the predictor variables in the model.

 

Classification Table

  • The next Classification table provides an indication of how well the model is able to predict the correct category once the predictors are added into the study. We can compare this with the Classification Table shown for Block 0, to see how much improvement there is when the predictor variables are included in the model. The model correctly classified 75.1 percent of cases overall (sometimes referred to as the percentage accuracy in classification: PAC).
  • In other words, this is the rate of correct classification if we always predict that a respondent would encourage others to take on Islamic banking. Specifically, it presents information on the degree to which the observed outcomes are predicted by your model.
  • The percentages in the first two rows provide information regarding Specificity and Sensitivity of the model in terms of predicting group membership on the dependent variable.
  • Specificity (Also Called True Negative Rate) refers to percentage of cases observed to fall into the non-target (or reference) category (e.g., Those who will not select Private Bank) who were correctly predicted by the model to fall into that group (e.g., predicted not to select Private).
  • The specificity for this model is 17.8%.
  • Sensitivity (Also Called True Positive Rate) refers to percentage of cases observed to fall in the target group (Y=1; e.g., those who will select Private Bank) who were correctly predicted by the model to fall into that group (e.g., predicted to select Private Bank).
  • The sensitivity for the model is 96.4%.
  • Overall, the accuracy rate was very good, at 75.7%. The model exhibits good sensitivity since among those persons who will choose Private banks over Public Banks, 96.4% were correctly predicted to Choose Private Banks based on the model.

Variables in the Equation

  • Odds is the Ratio of Probability – P(A)/P(B)
  • This table shows the relationship between the predictors and the outcome.
  • B (Beta) is the predicted change in Log Odds – for 1 unit change in predictor, there is Exp(B) change in the probability of the outcome.
  • The beta coefficients can be negative or positive, and have a t-value and significance of the t-value associated with each. … If the beta coefficient is negative, the interpretation is that for every 1-unit increase in the predictor variable, the outcome variable will decrease by the beta coefficient value.

Odds Ratio: 1

Probability of falling into the target group is equal to the probability of falling into the non-target group

Odds Ratio: > 1 (Probability of Event Occurring)

Probability of falling into the target group is greater to the probability of falling into the non-target group. The Event is likely to occur.

Odds Ratio: < 1 (Probability of Event Occurring Decreases)

Probability of falling into the target group is Less to the probability of falling into the non-target group. The Event is unlikely to occur.

We can say that the odds of a customer choosing Private Bank offering Value Added Services are 1.367 times higher than those Public Sector Banks which do not offer Value Added Services, with a 95% CI of 1.097 to 1.703.

The important thing about this confidence interval is that it doesn’t cross 1. This is important because values greater than 1 mean that as the predictor variable(s) increase, so do the odds of (in this case) selecting Private Bank. Values less than 1 mean the opposite: as the predictor increases, the odds of selecting Private Bank Decreases.

What it is

Logistic regression analysis is a method to determine the reason-result relationship of independent variable(s) with dependent variable

Why Use It

Logistic regression analysis is a method to determine the reason-result relationship. The logistic regression predicts group membership Logistic regression predicts membership of one or other of the two dependent variable categories. In logistic regression, the expected outcome is represented by 1 while the other is coded as 0.

Assumptions

The independent variables need not be interval, nor normally distributed, nor linearly related, nor of equal variance within each group. The dependent variable in logistic regression is not measured on an interval or ratio scale. The dependent variable must be a dichotomous ( 2 categories) for the binary logistic regression.

Video: Stepwise Guide on How to Run and Interpret Binary Logistic Regression