Model Evaluation Beyond AUC: A Comparative Study of Somers’ D, Log Loss, Population Stability Index (PSI), and Kolmogorov–Smirnov (KS) Statistic in Credit Risk and Healthcare Prediction Models

Sai Prashanth Pathi

doi:10.63282/3050-9246/ICRTCSIT-113

Authors

Sai Prashanth Pathi Senior Data Scientist, Merrick Bank, USA. Author

DOI:

https://doi.org/10.63282/3050-9246/ICRTCSIT-113

Keywords:

Credit risk, Model evaluation, AUC, KS-statistic, Somers’ D, Population Stability Index, Log Loss, Healthcare prediction

Abstract

The Area Under the Receiver Operating Characteristic Curve (AUC) is the dominant evaluation metric in machine learning classification. However, AUC alone cannot capture important properties such as calibration, stability, and practical separability at thresholds. This paper presents an empirical comparison of AUC with Somers’ D, the Kolmogorov–Smirnov (KS) statistic, Log Loss, and the Population Stability Index (PSI) across three benchmark datasets: (1) the Breast Cancer dataset from scikit-learn, (2) the Heart Failure dataset from Kaggle, and (3) the Lending dataset from Kaggle. Our results show that for the Cancer dataset, Logistic Regression achieves near-perfect discrimination (AUC = 0.999, KS = 0.977) with low log loss and stable PSI, outperforming more complex models. In the Heart dataset, Gradient Boosting offers the best balance between discrimination (AUC = 0.943, KS = 0.784) and stability (PSI = 0.076), while Random Forest, though highly accurate, shows instability (PSI = 0.183). In the Lending dataset, all models show modest discrimination (AUC ≈ 0.70), but Logistic Regression and Gradient Boosting offer the best trade-off between simplicity, interpretability, and stability. These findings emphasize the importance of a multi-metric evaluation framework that goes beyond AUC, integrating discrimination, calibration, and stability metrics for trustworthy machine learning in regulated domains such as finance and healthcare

Downloads

Download data is not yet available.

References

[1] D. J. Hand and W. E. Henley, “Statistical classification methods in consumer credit scoring,” J. Royal Statistical Society A, vol. 160, 1997.

[2] L. Thomas, Consumer Credit Models: Pricing, Profit, and Portfolios, Oxford Univ. Press, 2009.

[3] S. García et al., “Evaluating classifier performance with highly imbalanced Big Data,” Journal of Big Data, vol. 10, 2023.

[4] B. Van Calster et al., “Calibration: the Achilles heel of predictive analytics,” BMC Medicine, vol. 17, no. 1, 2019.

[5] M. Majlatow et al., “Uncertainty-Aware Predictive Process Monitoring in Health- care,” Applied Sciences, vol. 15, no. 14, 2025.

[6] M. L. Desai et al., “Assessing calibration and bias of a deployed machine learning malnutrition prediction model,” JAMIA, 2023.

[7] A. Sudjianto and D. Burakov, “An Information-Theoretic Framework for Credit Risk Modeling,” arXiv:2509.09855, 2025.

[8] M. L. D. Santos et al., “Machine Learning for Credit Risk Prediction: A Systematic Literature Review,” Preprints.org, 2023.

[9] B. Van Calster et al., “Calibration of risk prediction models: impact on decision- analytic performance,” Medical Decision Making, vol. 39, no. 5, 2019.

[10] N. Siddiqi, Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring, Wiley, 2005.

Model Evaluation Beyond AUC: A Comparative Study of Somers’ D, Log Loss, Population Stability Index (PSI), and Kolmogorov–Smirnov (KS) Statistic in Credit Risk and Healthcare Prediction Models

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

How to Cite

Similar Articles

callforpaper

Submission

Menu

Latest publications

Information

Reach US

Ethics and Policies

Important Links

Downloads & Indexing

Similar Articles

Evaluating the Efficacy of Machine Learning Algorithms in Credit Card Limit Optimization and Customer Segmentation

Real-Time AI Integration Architectures for HIPAA-Compliant Healthcare Data Interoperability

AI for Personalized Healthcare: Predicting Risk and Recommending the Right Care

Latency-Aware and Energy-Efficient Switching Protocols for Next-Generation IP Backbone Networks Using AI-Augmented Control Planes

Generative AI in P&C: Transforming Claims and Customer Service

Cyber Insurance Evolution: Addressing Ransomware and Supply Chain Risks

Empirical Investigation of Deep Learning Architectures for Systematic Credit Risk Classification in Heterogeneous Financial Markets

Secure Data Backup Strategies for Machine Learning: Compliance and Risk Mitigation Regulatory requirements (GDPR, HIPAA, etc.)

Federated Learning in Financial Data Privacy: A Secure AI Framework for Banking Applications

Optimizing Risk Assessment Methodologies for OT in Critical Infrastructure Sectors