A machine learning framework for performing binary classification on tabular biomedical data / Szijártó Ádám [et al.]
Bibliogr.: p. 5-6. - Abstr. eng. - DOI: https://doi.org/10.1556/1647.2023.00109
In: Imaging. - ISSN eISSN 2732-0960. - 2023. 15. évf. 1. sz., p. 1-6. : ill.
Background and aim: Over the past decades, we have witnessed an immense expansion in the arsenal and performance of machine learning (ML) algorithms. One of the most important fields that could benefit from these advancements is biomedical science. To streamline the training and evaluation of binary classifiers, we constructed a universal and flexible ML framework that uses tabular biomedical data as input. Methods and results: Our framework requires the input data to be provided as a comma-separated values file, in which rows correspond to subjects and columns represent different features. After reading the content of this file, the framework enables the users to perform outlier detection, handle missing values, rescale features, and tackle class imbalance. Then, hyperparameter tuning, feature selection, and internal validation are performed using nested cross-validation. If an additional dataset is available, the framework also provides the option for external validation. Users may also compute SHapley Additive exPlanations values to interpret the individual predictions of the model and identify the most important features. Our ML framework was implemented in Python (version 3.9), and its source code is freely available via GitHub. In the second part of this paper, we also demonstrate the usage of the framework through a case study from the field of cardiovascular imaging. Conclusions: The proposed ML framework enables the efficient training and evaluation of binary classifiers on tabular biomedical data. We hope our framework will serve as a useful resource for both learning and research purposes and will promote further innovation. Kulcsszavak: machine learning, supervised learning, binary classification, tabular data