Ordinal logistic regression using SPSS:
The proportional odds (PO) model
Mike Crowson, Ph.D.
August 17, 2021
Youtube link: https://youtu.be/CdOHB3U5YHk
Ordinal logistic regression can be utilized when you have a dependent variable with values coming in the form of ordered categories. The assumption when treating a variable as ordered categorical is that the response categories simply reflect a relative ordering (in terms of < or >) on that variable and that differences in adjacent ranks do not convey equivalence in terms of the amount of a characteristic as would be the case had the variable been measured with greater precision (i.e., taking on a metric quality).
There are several types of ordinal logistic regression models (Brkner & Vuorre, 2019). However, our focus in this presentation is on the Proportional Odds (PO) model (also referred to as the cumulative logit model) since it is the most common form of ordinal logistic regression you will see in the literature (Osborne, 2017). The PO model aims to predict the probability of a case “being at or below a particular level of a response variable, or being beyond a particular level, which is the complementary direction,” (Liu, O’Connell, & Koirala, 2011, p. 513) from one or more predictor variables.
The PO model is a generalization of binary logistic regression. Yet, a key difference is that while binary logistic regression centers on modeling the relationship between a set of predictors and the probability of a case being in a particular group with respect to the dependent variable, the PO model is designed to model the cumulative probability of a case being at or below a given level on the ordered categorical variable. Additionally, whereas binary logistic regression uses the language of odds and logits, the PO model uses the language of cumulative odds and cumulative logits.
‘Wordsum’: Count of the number of words correct on the GSS vocabulary test
‘Polviews’: Self-identified level of political conservatism (coded 1=extremely liberal, 7=extremely conservative)
‘female’: Gender identification coded 0=identifies male, 1=identifies female
‘degree.r’: Highest degree attained.
‘consci.r’: Confidence in the scientific community, with values reflecting one of three ordered categories. Coded 1=Hardly any (confidence), 2=only some (confidence), 3=a great deal (of confidence).
Subset of data from 2018 General social survey
Note: Gender identification is a binary variable. It would be perfectly fine to include it as a factor variable. However, it is also permissible to include it as a covariate (given its binary nature). This choice reflects my personal preference only.
Here, I have selected off the default (‘Including multinomial constant’) to ‘Excluding multinomial constant’. This will be useful later when I demonstrate the computation of McFadden’s pseudo R-square. But in everyday practice, you can leave the default in place.
We will leave the default setting (logit link) as is.
The -2 Log likelihoods shown in this table are model deviances. The deviance for a model is an indicator of lack of fit and is scaled from 0 (perfect fit in relation to a saturated model) to positive infinity (where values that are increasingly greater than 0 reflect worsening fit). Eyeballing the model deviances in our table, we see that the deviance for the final model is closer to 0 than the deviance for the intercept-only model thereby indicating that the full model is a better fit that the intercept-only model.
Now, noting that a model containing the full set of predictors is a better fit than a model with no predictors (i.e., intercept-only / null model) does not tell us very much. That is why simply comparing model deviances via the eyeball method is not very useful. Logically, we would wish to know whether (a) the full model fits significantly better than the null model and (b) how much better does the full model fit the data relative to the null model.
To address (a; previous slide), we use a likelihood ratio chi-square test. The chi-square value shown in this table is the difference in the deviance between an intercept-only/null model (-2LL = 549.371) and a model containing the full set of predictors (-2LL = 505.818). After adding in the predictors, the deviance is reduced by 43.553 [computed as 549.371 (null model) – 505.818 (full model)]. This is our chi-square test value used to test whether the change in chi-square from the null to full model is significantly greater than 0. The degrees of freedom for the model equals the number of predictors in the full model.
If we find statistical significance for this test, then we infer that the full model represents a significantly better fit to the data than the null model. Non-significance is taken as an indicator that the full model does not fit the data substantially better than a model with no predictors.
Interpretation of test result: For our analysis, we see that the full model containing our predictors does represent a significant improvement in fit relative to the null model, LR χ²(4) = 43.553, p<.001.
Had we left the default of ‘Including multinomial constant’ checked, the deviances for the full and reduced models would be 382.564 and 339.011, respectively (see below). The likelihood chi-square test results, however, are the same.
The Pearson chi-square and deviance chi-square tests are additional tests of model fit. Non-significant test results indicate a good fit to the data, whereas significant test results indicate poor fit.
Allison (2014) provides a more thorough description of what these tests are testing. He describes them as “testing whether there are any non-linearities or interactions” (p. 4) that are not included in your model. According to Allison (2012), if you choose to re-specify your model to include non-linearities or interactions given significant test results, you should “be selective” and focus “only on those variables in which you have the greatest interest, or the greatest suspicion that something more might be needed” (p. 68).
In the output we see above, both the Pearson chi-square [χ²(290) = 294.912, p=.409] and Deviance chi-square [χ²(290) = 249.103, p=.961] tests are non-significant, suggesting a well-fitting model.
Note: Various authors (e.g., Allison, 2012, 2014; Archer & Lemeshow, 2006; Fagerland & Hosmer, 2017; Pulkstenis & Robinson, 2002) have commented that the Pearson chi-square and deviance chi-square tests may be inaccurate in circumstances where continuous covariates are included in the model and where the number of distinct covariate profiles approaches the sample size.
Interpretation: The model containing our full set of predictors exhibits a 7.9% improvement in fit relative to an intercept-only model.
Pseudo-R-square values such as McFadden’s are typically lower than those computed using linear regression (Pituch & Stevens, 2016). As such, it is problematic to attempt to generalize Cohen’s (1988) conventions for interpreting R-square to situations where you are interpreting pseudo-R-squares. Both Pituch and Stevens (2016) and Tabachnick and Fidell (2013) – citing McFadden (1979) – indicated that McFadden’s pseudo-R-square values between .2 and .4 may be viewed as being consistent with a strong improvement in model fit. That said, you should also consider the context of your research and prior findings before using these values as a guide for interpreting the magnitude of the effect in your study.
Assuming categories on the dependent variable are ordered in ascending fashion (i.e., moving from lower to higher values on the dependent variable; the default in SPSS), we can interpret an estimated regression slope as the predicted change in the log odds (or logits) of a case falling above a given category j on the dependent variable, holding the remaining predictors constant. As you can see, this parameterization allows the signs of the regression coefficients to be interpreted as in standard linear regression. Positive coefficients are associated with an increased likelihood of a case falling in a higher (as opposed to lower) category and negative coefficients are associated with a decreased likelihood of falling in a higher (as opposed to lower) category (Heck et al., 2012; Osborne, 2017).
The ‘thresholds’ you see in this table refer to transition points on a latent continuous dependent variable y*, where a case moves from being in a lower category j to a higher category j. More specifically, Osborne (2017) states that each threshold is the cumulative log odds of a case being in group j or below when the predictors are zero. The first threshold you see in the table above represents the demarcation between individuals who fall in category 1 and those in categories 2 & 3 on the dependent variable. The second threshold represents the demarcation between individuals belonging to categories 1 and 2 and category 3 on the dependent variable. In general, thresholds are not of substantive interest during interpretation of results.
Interpretations:
Highest degree is a positive and significant predictor (b=.384, s.e.=.105, p<.001) of the probability of a case falling into a higher as opposed to lower category on confidence in the scientific community. [Given the coding on confidence – where 1=hardly any, 2=only some, 3=a great deal – the positive coefficient indicates that persons with more education are more confident in the scientific community.
Female (gender identification; coded 0=male, 1=female) is a negative predictor (b=-.461, s.e.=.230, p=.045) of the probability of a case falling into a higher as opposed to lower category on confidence. [In other words, persons identifying as female are less confident in the scientific community than males.]
Interpretations:
Polviews is a negative, but non-significant predictor (b=-.088, s.e.=.109, p=.415) of the probability of falling into a higher as opposed to lower category on confidence. [Given the coding of polviews – where 1=extremely liberal to 7=extremely conservative – the negative slope indicates that persons describing themselves as more conservative have less confidence in the scientific community. Again, this predictor is not a significant contributor in the model.]
Wordsum is a positive and significant predictor (b=.234, s.e.=.070, p<.001) in the model, indicating persons getting more vocabulary words correct on the vocabulary test in the GSS were more likely to fall into a higher as opposed to lower category in terms of confidence in the scientific community. In other words, persons performing better on the test demonstrated a greater level of confidence than persons performing worse on the test.
One of the main assumptions of the Proportional odds model (for ordinal logistic regression) here is that the effects of the predictors on the odds of falling into a higher (versus lower) category on the dependent variable is the same across categories. Briefly, let us reframe our ordinal logistic regression in terms of two binary logistic regressions (see Allison, 2012; Brant, 1990; Liu, 2016), where the first logistic regression models the odds [Pr(Y=3)/(Pr(Y=1)+Pr(Y=2))] of a case falling into category 3 as a function of the predictors and the second models the odds [(Pr(Y=2)+Pr(Y=3))/Pr(Y=1)] of a case falling into categories 2 & 3. You can think of the Test of Parallel lines as a test of whether the regression coefficients would be the same or different across these two regression models. In other words, the effects of the predictors are the same across levels of the dependent variable.
A non-significant test result suggests the assumption of proportional odds is met, meaning the effects of the independent variables on the cumulative probability of falling into a higher category does not vary across categories on the dependent variable. In our output, that appears to be the case as p=.491.
The odds ratio (OR) indicates the multiplicative change in the odds of a case falling into a higher category changes per unit increase on predictor k. An OR > 1 indicates increasing odds of being in a higher category per unit increase on predictor k. An OR < 1 indicates decreasing odds of being in a higher category per unit increase. An OR = 0 indicates no change in odds per increase on the predictor.
Approach B: Run your analysis using the generalized linear model route in SPSS
Leave set at Ascending to generate same results as before.
[Note: If you change to Descending then the signs of the regression coefficients would be opposite of what we generate with this selection.]
This selection will result in odds ratios being printed in output.
Same odds ratios as we calculated earlier by hand
Interpretation of odds ratios:
‘Highest degree’ (OR = 1.469): For each one unit increase on ‘highest degree’, the odds of a person with a higher degree falling into a higher level of confidence in scientific community changes by a factor of 1.469. Since this number is greater than one, this means that the odds of having greater confidence in the scientific community is greater among those with higher education and lower among those with lower education. [By extension, this indicates that persons with higher education are more likely to express higher confidence in the scientific community as compared to persons with lower education.]
Interpretation of odds ratios:
‘Female gender identification’ (OR=.630): Recall that female gender identification is coded 0=identifies male, 1=identifies female. The odds ratio indicates that the odds of falling into a higher category (i.e., greater confidence) for persons identified as female were .630 times that of the odds for those identifying as male. Since the odds ratio is less than one, it signals that the odds for females falling into a higher category is less than the odds for males. [By extension, we can say that the probability of having greater confidence is lower among persons identifying as female as compared to those identifying as male.]
Interpretation of odds ratios:
‘polcon’ (OR=.915): Recall that ‘polcon’ is coded so that 1=extremely liberal and 7=extremely conservative. The odds ratio, therefore, is interpreted as the multiplicative change in odds of falling into a higher category (i.e., greater confidence) for each one unit increase on self-identified conservatism. In other words, for each raw score increase on conservatism, the odds of falling into a higher confidence category (i.e., expressing greater confidence in the scientific community) changed by a factor of .915. Since the odds ratio is less than one, it indicates that persons identifying themselves as more conservative were less likely to indicate greater confidence in the scientific community.
Interpretation of odds ratios:
‘Wordsum’ (OR=1.264): The odds ratio is interpreted as the multiplicative change in odds of falling into a higher category (i.e., greater confidence) for each additional word correct on the GSS vocabulary test. Thus, we can say that for each additional word correct on the test, the odds changed by a factor of 1.264. Since this number is greater than 1, this indicates that persons getting more words correct were more likely to express greater confidence in the scientific community.
What to do if Proportional Odds (PO) assumption is not met?
Earlier, I noted that one of the main assumptions of the Proportional odds model is that the effects of the predictors on the probability of falling into a higher (as opposed to lower) category j are the same across levels of the dependent variable. When this assumption is not met, one must consider the possibility of relaxing that assumption and allow for the possibility that the effects of the predictors are not equivalent across levels of the dependent variable. One way of relaxing this assumption is to use the Partial Proportional Odds (PPO) Model (O’Connell & Liu, 2011). This model allows some or all regression slopes to vary across the cutpoints (thresholds) on the dependent variable. According to O’Connell & Liu (2011), “if some of the effects are found to be stable they, may be held constant as in the PO model” (p. 141). Unfortunately, SPSS does not appear to allow one to test a PPO model (it is possible with Stata along with the gologit2 package; see Williams, 2006). The only other option I am aware of in SPSS is multinomial logistic regression which allows all effects to vary across levels of the dependent variable.
Previously, we evaluated whether the PO assumption is met using the Test of Parallel lines and found no evidence of a violation. Had we found a violation of this assumption, then (as noted above) we might have considered using a multinomial logistic regression (if we had no desire to consider something like the PPO model using a different program). It is possible to compare the fit of the PO model against a multinomial logistic regression model to determine which parameterization results in a better fit to the data. This can be done by comparing the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) for both models. Using these indices, the better fitting model is the one with the smaller AIC or BIC value.
To obtain the AIC and BIC for the ordinal model, you can obtain this information by running the model through the generalized linear models option in SPSS. See slides 18-20 for how to run the analysis through this route.
The AIC and BIC for this model are displayed in the next slide…
AIC and BIC for Proportional odds model run through generalized linear models route in SPSS
To obtain the AIC and BIC for a multinomial model, run the model in the following way…
AIC and BIC from multinomial model
Comparison of AIC and BIC to identify preferred model: We see that the AIC and BIC for the proportional odds model (left) is smaller than those associated with the multinomial model (right). As such, the preferred model seems to be the proportional odds model.
One last thing: Inclusion of factor variables in the model
Earlier I mentioned including gender identification as a covariate in the model, but that it also works to include this variable as a factor variable. [This option is also available for categorical predictors with > 2 categories]. When this option is selected, SPSS automatically recodes the factor variable into dummy coded variables to reflect group membership. By default, the value associated with the highest category is treated as the reference category.
The fit of the model is the same as before. However, you will notice that the regression slope of .461 is the opposite from the -.461 we saw earlier. Because the group coded 1 (identifies female) is the reference category, the slope of .461 indicates that persons identified as male are more likely to fall into a higher category than those identified as female.
If we re-run the analysis under Generalized linear models and add gender identification as a factor (and set the category order as Ascending), then we get the same results plus the odds for males falling into a higher category on the dependent variable.
For persons identifying as male, the odds of belonging to a higher category on the dependent variable is 1.586 times that for those identifying as female.
References and resources
Agresti, A. (2019). An introduction to categorical data analysis (3rd ed). Hoboken, NJ: John Wiley & Sons.
Allison, P.S. (2012). Logistic regression using SAS: Theory and application (2nd ed). Cary, NC: SAS Institute.
Allison, P. D. (2014). Measures of fit for logistic regression. SAS Global Forum, Washington, DC. Downloaded August 13, 2021 from https://statisticalhorizons.com/wp-content/uploads/GOFForLogisticRegression-Paper.pdf.
Archer, K.J., & Lemeshow, S. (2006). Goodness-of-fit test for a logistic regression model fitted using survey data. The Stata Journal, 6, 97-105.
Brant, R. (1990). Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics, 46, 1171-1178.
Bürkner, P.-C., & Vuorre, M. (2019). Ordinal regression models in psychology: A tutorial. Advances in Methods and Practices in Psychological Science, 2, 77-101. Downloaded August 13, 2021 from https://journals.sagepub.com/doi/pdf/10.1177/2515245918823199
Fagerland, M.W., & Hosmer, D.W. (2017). How to test for goodness of fit in ordinal logistic regression models. The Stata Journal, 17, 668-686.
Heck, R.H., Thomas, S.L., & Tabata, L.N. (2012). Multilevel modeling of categorical outcomes using IBM SPSS. New York: Routledge.
References and resources
Hosmer, D.W., Lemeshow, S., & Sturdivant, R.X. (2013). Applied logistic regression (3rd ed). Hoboken, New Jersey: John Wiley & Sons, Inc.
Liu, X. (2016). Applied ordinal logistic regression using Stata. Thousand Oaks, CA: Sage.
Liu, X., O’Connell, & Koirala, H. (2011). Ordinal regression analysis: Predicting mathematics proficiency using the continuation ratio model. Journal of Modern Statistical Methods, 10, 513-527.
Long, J.S., & Freese, J. (2014). Regression models for categorical dependent variables using Stata (3rd ed). College Station, TX: StataCorp.
Osborne, J.W. (2017). Regression and linear modeling: Best practices and modern methods. Thousand Oaks, CA: Sage.
Pituch, K.A., & Stevens, J.A. (2016). Applied multivariate statistics for the social sciences (3rd ed).
Pulkstenis, E., & Robinson, T.J. (2002). Two goodness-of-fit tests for logistic regression models with continuous covariates. Statistics in Medicine, 21, 79-93.
Tabachnick, B.G., & Fidell, L.S. (2013). Using multivariate statistics (6th ed.). New York: Pearson.
Williams, R. (2006). Generalized ordered logit/partial proportional odds models for ordinal dependent variables. The Stata Journal, 6, 58-82.
�