Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction

Shamsutdinova, Diana; Stamate, Daniel; and Stahl, Daniel. 2025. Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction. International Journal of Medical Informatics, 194, 105700. ISSN 1386-5056 [Article]
Copy

Background
Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent. Moreover, ML does not always outperform Cox-PH in clinical settings, warranting a diligent model validation. We aimed to develop a set of R functions to help explore the limits of Cox-PH compared to the tree-based and deep learning survival models for clinical prediction modelling, employing ensemble learning and nested cross-validation.

Methods
We developed a set of R functions, publicly available as the package "survcompare”. It supports Cox-PH are Cox-Lasso, and Survival Random Forest (SRF) and DeepHit are the ML alternatives, along with the ensemble methods integrating Cox-PH with SRF or DeepHit designed to isolate the marginal value of ML. The package performs a repeated nested cross-validation and tests for statistical significance of the ML’s superiority using the survival-specific performance metrics, the concordance index, time-dependent AUC-ROC and calibration slope. To get practical insights, we applied this methodology to clinical and simulated datasets with varying complexities and sizes.

Results
In simulated data with non-linearities or interactions, ML models outperformed Cox-PH at sample sizes ≥500. ML superiority was also observed in imaging and high-dimensional clinical data. However, for tabular clinical data, the performance gains of ML were minimal; in some cases, regularised Cox-Lasso recovered much of the ML’s performance advantage with significantly faster computations. Ensemble methods combining Cox-PH and ML predictions were instrumental in quantifying Cox-PH’s limits and improving ML calibration. Traditional models like Cox-PH or Cox-Lasso should not be overlooked while developing clinical predictive models from tabular data or data of limited size.

Conclusion
Our package offers researchers a framework and practical tool for evaluating the accuracy-interpretability trade-off, helping make informed decisions about model selection.


picture_as_pdf
1-s2.0-S1386505624003630-main.pdf
subject
Published Version
Available under Creative Commons: Attribution 4.0

View Download
visibility_off picture_as_pdf

Accepted Version
lock

Atom BibTeX OpenURL ContextObject in Span OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL Data Cite XML EndNote HTML Citation METS MODS RIOXX2 XML Reference Manager Refer ASCII Citation
Export

Downloads