Open Access   Article Go Back

Enhancing Chronic Diseases Prediction through Machine Learning and Data Pre-Processing Strategies

D.J. Samatha Naidu1 , A. Venkatesh2

  1. Dept. of MCA, Annamacharya PG College of Computer Studies, Rajampet, India.
  2. Dept. of MCA, Annamacharya PG College of Computer Studies, Rajampet, India.

Section:Research Paper, Product Type: Journal Paper
Volume-13 , Issue-3 , Page no. 41-48, Mar-2025

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v13i3.4148

Online published on Mar 31, 2025

Copyright © D.J. Samatha Naidu, A. Venkatesh . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

IEEE Style Citation: D.J. Samatha Naidu, A. Venkatesh, “Enhancing Chronic Diseases Prediction through Machine Learning and Data Pre-Processing Strategies,” International Journal of Computer Sciences and Engineering, Vol.13, Issue.3, pp.41-48, 2025.

MLA Style Citation: D.J. Samatha Naidu, A. Venkatesh "Enhancing Chronic Diseases Prediction through Machine Learning and Data Pre-Processing Strategies." International Journal of Computer Sciences and Engineering 13.3 (2025): 41-48.

APA Style Citation: D.J. Samatha Naidu, A. Venkatesh, (2025). Enhancing Chronic Diseases Prediction through Machine Learning and Data Pre-Processing Strategies. International Journal of Computer Sciences and Engineering, 13(3), 41-48.

BibTex Style Citation:
@article{Naidu_2025,
author = {D.J. Samatha Naidu, A. Venkatesh},
title = {Enhancing Chronic Diseases Prediction through Machine Learning and Data Pre-Processing Strategies},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2025},
volume = {13},
Issue = {3},
month = {3},
year = {2025},
issn = {2347-2693},
pages = {41-48},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=5781},
doi = {https://doi.org/10.26438/ijcse/v13i3.4148}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v13i3.4148}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=5781
TI - Enhancing Chronic Diseases Prediction through Machine Learning and Data Pre-Processing Strategies
T2 - International Journal of Computer Sciences and Engineering
AU - D.J. Samatha Naidu, A. Venkatesh
PY - 2025
DA - 2025/03/31
PB - IJCSE, Indore, INDIA
SP - 41-48
IS - 3
VL - 13
SN - 2347-2693
ER -

VIEWS PDF XML
18 40 downloads 7 downloads
  
  
           

Abstract

Leveraging machine learning for the early detection and prevention of chronic diseases, including diabetes, stroke, cancer, cardiovascular conditions, kidney failure, and hypertension, holds significant promise as emphasized by the WHO. This review systematically examines the application of machine learning techniques to predict these conditions using medical records and general health checkup data, with a focus on enhancing prediction accuracy through meticulous error minimization. Critical to this endeavor is the quality of input data, where challenges such as outlier detection, missing value imputation, feature selection, data normalization, and class imbalance pose substantial obstacles to model performance. Effective data preprocessing is thus paramount, ensuring high-quality inputs that facilitate robust model selection. Techniques explored encompass supervised learning, ensemble learning, deep learning, and reinforcement learning. Performance evaluation utilizes metrics like accuracy, recall, precision, and F1-score to gauge model efficacy. Furthermore, this study identifies open research challenges and proposes future directions to improve prediction performance via advanced preprocessing and machine learning methodologies, aiming to optimize data-driven approaches for improved healthcare outcomes.

Key-Words / Index Term

Chronic Disease Prediction, Machine Learning, Data Preprocessing, Feature Selection, Model Evaluation, Healthcare Analytics, Medical Data, Supervised Learning, Deep Learning, Ensemble Learning, Outlier Detection, Missing Value Imputation, Data Normalization, Class Imbalance, Reinforcement Learning.

References

[1] R. Ghorbani and R. Ghousi, ‘‘Predictive data mining approaches in medical diagnosis: A review of some diseases prediction,’’ Int. J. Data Netw. Sci., Vol.3, No.2, pp.47–70, 2019.
[2] F. Gorunescu, Data Mining: Concepts, Models and Techniques. India: Springer, 2011.
[3] H. C. Koh and G. Tan, ‘‘Data mining applications in healthcare,’’ J. Healthc. Inf. Manag., Vol.19, No.2, pp.65, 2011.
[4] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chuvarada, ‘‘Machine learning and data mining methods in diabetes research,’’ Comput. Struct. Biotechnol. J., Vol.15, pp.104–116, 2017.
[5] B. S. Ahamed, M. S. Arya, and A. O. V. Nancy, ‘‘Diabetes mellitus disease prediction using machine learning classifiers with oversampling and feature augmentation,’’ Adv. Hum.-Comput. Interact., Vol.2022, pp.1–14, 2022.
[6] P. Theerthagiri, A. U. Ruby, and J. Vidya, ‘‘Diagnosis and classification of diabetes using machine learning algorithms,’’ Social Netw. Comput. Sci., Vol.4, No.1, pp.72, 2022.
[7] R. R. Kadhim and M.Y. Kamil, ‘‘Comparison of machine learning models for breast cancer diagnosis,’’ IAES Int. J. Artif. Intell. (IJ-AI), Vol.12, No.1, pp.415, 2023.
[8] G. Kumawat, S. K. Vishwakarma, P. Chakrabarti, P. Chittora, T. Chakrabarti, and J. C.-W. Lin, ‘‘Prognosis of cervical cancer disease by applying machine learning techniques,’’ J. Circuits, Syst. Comput., Vol.32, No.1, 2023.
[9] R. Huang, J. Liu, T. K.Wan, D. Siriwanna,Y. M. P.Woo, A.Vodenicarevic, C. W. Wong, and K. H. K. Chan, ‘‘Stroke mortality prediction based on ensemble learning and the combination of structured and textual data,’’ Comput. Biol. Med., Vol.155, 2023.
[10] P. B. Dash, ‘‘Efficient ensemble learning based CatBoost approach for early-stage stroke risk prediction,’’ in Ambient Intelligence in Health Care: Proceedings of ICAIHC 2022. Singapore: Springer, pp.475–483, 2022.
[11] W. Chang, Y. Liu, Y. Xiao, X. Yuan, X. Xu, S. Zhang, and S. Zhou, ‘‘A machine-learning-based prediction method for hypertension outcomes based on medical data,’’ Diagnostics, Vol.9, No.4, pp.178, 2019.
[12] M. A. J. Tengnah, R. Sooklall, and S. D. Nagowah, ‘‘A predictive model for hypertension diagnosis using machine learning techniques,’’ in Telemedicine Technologies. Mauritius: Academic, pp.139–152, 2019.
[13] S. Revathy, ‘‘Chronic kidney disease prediction using machine learning models,’’ Int. J. Eng. Adv. Technol., Vol.9, No.1, pp.6364–6367, 2019.
[14] K. R. A. Padmanaban and G. Parthiban, ‘‘Applying machine learning techniques for predicting the risk of chronic kidney disease,’’ Indian J.Sci. Technol., Vol.9, No.29, pp.1–6, 2016.
[15] I. V. Stepanyan, ‘‘Comparative analysis of machine learning methods for prediction of heart disease,’’ J. Mach. Manuf. Reliab., Vol.51, No.8, pp.789–799, 2022.
[16] P. S. Patil, ‘‘Heart disease prediction using machine learning techniques,’’ in Proc. Int. Conf. Commun. Signal Process. Control (ICCSPC), pp.1–6, 2022.
[17] M. A. Almustafa, M. A. Alrahim, and H. A. Aljamaan, ‘‘An efficient missing value imputation using fuzzy c-means clustering for diabetes disease prediction,’’ J. Healthc. Eng., Vol.2022, pp.1–11, 2022.
[18] S. Muthulakshmi and M. S. Parveen, ‘‘Heart disease prediction using machine learning techniques,’’ in Proc. 3rd Int. Conf. Intell. Commun. Technol. Virtual Mobile Netw. (ICICV), pp.1024–1028, 2021.
[19] M. A. Almustafa, M. A. Alrahim, and H. A. Aljamaan, ‘‘Handling class imbalance problem for predicting chronic kidney disease using machine learning,’’ J. Healthc. Eng., Vol.2022, pp.1–10, 2022.
[20] N. G. Ramadhan and A. N. Romadhony, ‘‘Imbalanced data handling in diabetes mellitus prediction using random forest algorithm,’’ in Proc. Int. Conf. Inf. Technol. Syst. Innov. (ICITSI), pp.1–6, 2021.