An Empirical Analysis of Machine Learning Algorithms for Chronic Kidney Disease Prediction Using NHANES Clinical Data A Comprehensive Empirical Study

Main Article Content

Yugesh B, T. Anuradha

Abstract

Chronic Kidney Disease (CKD) is a progressive and globally prevalent condition affecting over 850 million individuals worldwide. Early detection is critical to slowing disease progression and improving patient outcomes. This study presents a rigorous empirical evaluation of eight supervised machine learning algorithms — Gradient Boosting, Random Forest, Decision Tree, AdaBoost, Logistic Regression, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naive Bayes — applied to the NHANES 2021–2023 dataset for binary CKD classification. A structured preprocessing pipeline encompassing missing value imputation, outlier winsorization, feature encoding, leakage removal, and standardization was applied prior to model training. Results demonstrate that Gradient Boosting achieved superior performance with an accuracy of 98.83%, precision of 99.10%, recall of 99.22%, F1-score of 99.16%, and an ROC-AUC of 0.9986. The findings highlight ensemble tree-based methods as highly effective for CKD prediction and provide insights into feature relevance and clinical applicability of ML-driven diagnostic systems.

Article Details

Issue
Section
Articles