Open Access   Article Go Back

Handling Imbalanced Heart Disease Data and Explaining the Factors

Sandip Das1 , Gairik Sajjan2 , Arkajyoti Poddar3 , Tamojit Dasgupta4 , Sayani Patty5 , Debmitra Ghosh6

Section:Research Paper, Product Type: Journal Paper
Volume-11 , Issue-01 , Page no. 62-65, Nov-2023

Online published on Nov 30, 2023

Copyright © Sandip Das, Gairik Sajjan, Arkajyoti Poddar, Tamojit Dasgupta, Sayani Patty, Debmitra Ghosh . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Sandip Das, Gairik Sajjan, Arkajyoti Poddar, Tamojit Dasgupta, Sayani Patty, Debmitra Ghosh, “Handling Imbalanced Heart Disease Data and Explaining the Factors,” International Journal of Computer Sciences and Engineering, Vol.11, Issue.01, pp.62-65, 2023.

MLA Style Citation: Sandip Das, Gairik Sajjan, Arkajyoti Poddar, Tamojit Dasgupta, Sayani Patty, Debmitra Ghosh "Handling Imbalanced Heart Disease Data and Explaining the Factors." International Journal of Computer Sciences and Engineering 11.01 (2023): 62-65.

APA Style Citation: Sandip Das, Gairik Sajjan, Arkajyoti Poddar, Tamojit Dasgupta, Sayani Patty, Debmitra Ghosh, (2023). Handling Imbalanced Heart Disease Data and Explaining the Factors. International Journal of Computer Sciences and Engineering, 11(01), 62-65.

BibTex Style Citation:
@article{Das_2023,
author = {Sandip Das, Gairik Sajjan, Arkajyoti Poddar, Tamojit Dasgupta, Sayani Patty, Debmitra Ghosh},
title = {Handling Imbalanced Heart Disease Data and Explaining the Factors},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {11 2023},
volume = {11},
Issue = {01},
month = {11},
year = {2023},
issn = {2347-2693},
pages = {62-65},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=1413},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=1413
TI - Handling Imbalanced Heart Disease Data and Explaining the Factors
T2 - International Journal of Computer Sciences and Engineering
AU - Sandip Das, Gairik Sajjan, Arkajyoti Poddar, Tamojit Dasgupta, Sayani Patty, Debmitra Ghosh
PY - 2023
DA - 2023/11/30
PB - IJCSE, Indore, INDIA
SP - 62-65
IS - 01
VL - 11
SN - 2347-2693
ER -

           

Abstract

Heart disease is one of the most serious and life threatening problems. If predicted beforehand, many lives can be saved. But, the problem is that medical datasets are highly imbalanced, which leads machine learning algorithms to perform poorly on the minority class. Which in terms leads to wrong predictions. In healthcare it is highly risky to predict something wrongly, because, people’s lives are on stake. The ratio of minority and majority class data should be 1:1, or near about equal, in order to get a good result. Synthetic Minority Oversampling TEchnique(SMOTE) is one such oversampling technique that makes it come true, which is used in this work. In addition we have used eXplainable AI(XAI) to better visualise the predictions. We have used LIME (Local Interpretable Model-agnostic Explanation) and SHAP (Shapely Additive Explanations) algorithms to understand the contributions of features towards the predictions.

Key-Words / Index Term

Heart Disease, SMOTE, Machine Learning, Explainable AI, LIME, SHAP

References

[1] Deldar, K., Mahdavi, M., & Mohammadzadeh, N. (2020). Handling imbalanced healthcare data with supervised and unsupervised methods: A systematic literature review. Journal of biomedical informatics, 109, 103516.
[2] Alshammari, R., & Bahsoon, R. (2019). Handling imbalanced data in healthcare: A systematic review. ACM Computing Surveys (CSUR), Vol.52, Issue.5, pp.1-38, 2019.
[3] Wang, S., Yao, J., Hu, Y., Zhao, L., & Zhang, Y. (2020). Addressing imbalanced datasets in medical image analysis. IEEE Transactions on Medical Imaging, Vol.39, Issue.7, pp.2408-2418, 2020.
[4] Al-Bahrani, R., Huang, W., & El-Sheimy, N. (2019). imbalanced healthcare data using ensemble methods and data sampling techniques. Applied Sciences, Vol.9, Issue.13, 2721, 2019.
[5] https://www.cdc.gov/heartdisease/facts.htm [DATASET]
[6] Wang, H., Yang, X., & Zhang, Q. (2019). A deep learning framework for handling imbalanced medical data. IEEE Access, 7, 89154-89162.
[7] Yao, J., Wang, S., Li, W., & Zhang, Y. (2020). Handling imbalanced electronic health record data using convolutional neural networks with auxiliary training. Journal of biomedical informatics, 110, 103530.
[8] L.H. Yang, J. Liu, Y.M.Wang, L. Martínez, A micro-extended belief rule-based system for big data multiclass classification problems, IEEE Trans. Syst. Man Cybern. Syst. pp.1–21, 2018.
[9] P.V. Ngoc, C.V.T. Ngoc, T.V.T. Ngoc, D.N. Duy. A C4. 5 algorithm for english emotional classification, Evolving Syst. 10, pp.425–451, 2019.
[10] Datta, Shounak, and Swagatam Das.Near-Bayesian Support Vector Machines forImbalanced Data Classi?cation with Equal or Unequal Misclassi?cation Costs. NeuralNetworks 70: pp.39–52, 2015.
[11] ahajournals.org/doi/full/10.1161/CIRCULATIONAHA.114.008729