News Release

E2E: an R package for easy-to-build ensemble models

Peer-Reviewed Publication

FAR Publishing Limited

Workflow of the E2E R Package

image: 

A diagram illustrating the workflow of the E2E package, from data input to model construction using ensemble methods like Bagging and Stacking, through model evaluation and interpretation, to final application.

view more 

Credit: Shanjie Luan

 

With the increasing application of machine learning in the medical field, building efficient and accurate predictive models is a key challenge. Although existing tools (such as tidymodels or mlr3) offer flexible model selection, they have limitations in integrating ensemble learning algorithms and handling imbalanced or large datasets.

 

To address these issues, Shanjie Luan from Shandong University and Ximing Wang from South China University of Technology have collaboratively developed an R package named E2E (easy to ensemble). The aim of this research is to provide medical researchers with a comprehensive, flexible, and easy-to-build framework for ensemble learning.

 

The core function of the E2E package lies in its strong model-building features. The package supports various ensemble strategies, including bagging, stacking, and voting, effectively combining the strengths of multiple base models to enhance prediction accuracy and robustness. It also comes with 12 diagnostic models and 6 prognostic models as its base models and allows users to add custom models as needed.

 

The team tested E2E using breast cancer data from the Cancer Genome Atlas (TCGA-BRCA) and data from the China Health and Retirement Longitudinal Study (CHARLS).

 

When handling the highly imbalanced TCGA breast cancer diagnostic data, E2E's imbalance handling model demonstrated excellent performance, with an AUROC (Area Under the Receiver Operating Characteristic Curve) of 0.9986 and an AUPRC (Area Under the Precision-Recall Curve) of 0.9999 on the test set, comparable to mature algorithms in Python. In the CHARLS data, E2E had the best performance, achieving an AUROC of 0.7414 on the test set. In prognostic analysis, E2E's Bagging framework achieved a C-index of 0.6742 on the test set of TCGA data, outperforming all compared single models and other methods.

 

Although E2E, as an R package, might not be as fast as Python for very large datasets, and its provided SHAP interpretability analysis is a rough approximation, it undoubtedly offers a powerful complement to machine learning applications within the R language ecosystem.

 

E2E is free to use, but you need to cite this paper " Luan, S. and Wang, X. (2025), E2E: An R Package for Easy-to-Build Ensemble Models. Med Research. https://doi.org/10.1002/mdr2.70030". The related code is fully open-source and uploaded to GitHub and CRAN, making it easy for researchers around the world to use and contribute. For detailed usage instructions, bug fixes, issue feedback, update logs, and more, please check out GitHub(https://github.com/XIAOJIE0519/E2E).


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.