Identifying Oversampling and under sampling of Data-A Practical Approach Using R

V. Shobana, K. Nandhini

Open Access Article Go Back

Identifying Oversampling and under sampling of Data-A Practical Approach Using R

V. Shobana¹ , K. Nandhini²

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-5 , Page no. 890-896, May-2019

CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i5.890896

Online published on May 31, 2019

Copyright © V. Shobana, K. Nandhini . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at Google Scholar | DPI Digital Library

XML View

PDF Download

How to Cite this Paper

IEEE Citation
MLA Citation
APA Citation
BibTex Citation
RIS Citation

IEEE Citation

IEEE Style Citation: V. Shobana, K. Nandhini, “Identifying Oversampling and under sampling of Data-A Practical Approach Using R,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.5, pp.890-896, 2019.

MLA Citation

MLA Style Citation: V. Shobana, K. Nandhini "Identifying Oversampling and under sampling of Data-A Practical Approach Using R." International Journal of Computer Sciences and Engineering 7.5 (2019): 890-896.

APA Citation

APA Style Citation: V. Shobana, K. Nandhini, (2019). Identifying Oversampling and under sampling of Data-A Practical Approach Using R. International Journal of Computer Sciences and Engineering, 7(5), 890-896.

BibTex Citation

BibTex Style Citation:
@article{Shobana_2019,
author = {V. Shobana, K. Nandhini},
title = {Identifying Oversampling and under sampling of Data-A Practical Approach Using R},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {5 2019},
volume = {7},
Issue = {5},
month = {5},
year = {2019},
issn = {2347-2693},
pages = {890-896},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4333},
doi = {https://doi.org/10.26438/ijcse/v7i5.890896}
publisher = {IJCSE, Indore, INDIA},
}

RIS Citation

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i5.890896}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4333
TI - Identifying Oversampling and under sampling of Data-A Practical Approach Using R
T2 - International Journal of Computer Sciences and Engineering
AU - V. Shobana, K. Nandhini
PY - 2019
DA - 2019/05/31
PB - IJCSE, Indore, INDIA
SP - 890-896
IS - 5
VL - 7
SN - 2347-2693
ER -

VIEWS	PDF	XML
315	342 downloads	158 downloads

Bar Line

Abstract

The stimulation of thyroid hormones has a greater impact in maintaining the metabolism our body. If there is any misbehavior in the hormones it will affect the functioning of other organs too. It is such an important gland and proper clinical advices should be taken if there is a misbehavior. The machine learning algorithms plays a major role in the early detection of thyroid disorder. This work focuses on applying random forest algorithm in prediction of thyroid disorder. The random forest algorithm classifies the class attribute and predicts the occurrence of hypo or hyper or normal scenario of thyroid. The algorithm predicts the result with maximum accuracy. The work is implemented in R. R is a statistical tool and it very much handles large volumes of data compared to other traditional mining tools. The algorithm predicts more accurately and the various performance metrics has been analysed.The data set has been taken from UCI Machine repository.

Key-Words / Index Term

Thyroid, random forest, big data, R studio, Confusion Matrix

References

[1].A.M. Ahmed and N.H. Ahmed”History of disorders of thyroid dysfunction”Eastern Mediterranean Health Journal, Vol. 11, No. 3, 2005.
[2]. K. Ramya, A.Sumathi, "Big Data Applications in Aadhar Card Fraud Detection", International Journal of Computer Sciences and Engineering, Vol.7, Issue.3, pp.865-867, 2019.
[3].Han Liu, Mihaela Cocea “Semi-random partitioning of data into training and test sets in granular computing context” December2017, Volume 2, Issue 4, pp. 357–386, Springer International Publishing.
[4]. Liu H, Gegov A, Cocea M (2016c) “Rule based systems for big data: a machine learning approach.” Springer, Switzerland.
[5]. L. Breiman, Random forests, Mach. Learning, 45 (1). (2001) 5-32. http : // dx.doi.org / 10.1023 /A:1010933404324.
[6]. Shobana.V, Dr.K.Nandhini,” Application of Classification Algorithms for Disease Diagnosis Using Big Data Analytics”, IJERCSE Vol.4, Issue 12, 2017.
[7]. Ammulu.K, Venugopal.T“Thyroid Data Prediction using Data Classification Algorithm”, IJIRST Vol. 4 Issue 2, July 2017.
[8]. Waheed Ahmad, Ayaz Ahmad, Chuncheng Lu, Barkat Ali Khoso, Lican Huang “A novel hybrid decision support system for thyroid disease forecasting” Springer January 2018.
[9]. Sakshi Gujral, "Predicting and Detecting Hectoring on Social Media Using Machine Learning", International Journal of Computer Sciences and Engineering, Vol.5, Issue.8, pp.173-176, 2017.

Citations	8797
h-index	34
i10-index	152

Impact Factor :	3.802
ISSN :	2347-2693 (Online)