Automatic Classification of Research Papers to a Predefined Category Using Machine Learning
Perpetua F Noronha1 , Prathiba R2 , Gauthami M3
Section:Research Paper, Product Type: Journal Paper
Volume-07 ,
Issue-09 , Page no. 47-51, Apr-2019
Online published on Apr 30, 2019
Copyright © Perpetua F Noronha, Prathiba R, Gauthami M . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Style Citation: Perpetua F Noronha, Prathiba R, Gauthami M, “Automatic Classification of Research Papers to a Predefined Category Using Machine Learning,” International Journal of Computer Sciences and Engineering, Vol.07, Issue.09, pp.47-51, 2019.
MLA Style Citation: Perpetua F Noronha, Prathiba R, Gauthami M "Automatic Classification of Research Papers to a Predefined Category Using Machine Learning." International Journal of Computer Sciences and Engineering 07.09 (2019): 47-51.
APA Style Citation: Perpetua F Noronha, Prathiba R, Gauthami M, (2019). Automatic Classification of Research Papers to a Predefined Category Using Machine Learning. International Journal of Computer Sciences and Engineering, 07(09), 47-51.
BibTex Style Citation:
author = {Perpetua F Noronha, Prathiba R, Gauthami M},
title = {Automatic Classification of Research Papers to a Predefined Category Using Machine Learning},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2019},
volume = {07},
Issue = {09},
month = {4},
year = {2019},
issn = {2347-2693},
pages = {47-51},
url = {},
doi = {}
publisher = {IJCSE, Indore, INDIA},
RIS Style Citation:
DO = {}
UR -
TI - Automatic Classification of Research Papers to a Predefined Category Using Machine Learning
T2 - International Journal of Computer Sciences and Engineering
AU - Perpetua F Noronha, Prathiba R, Gauthami M
PY - 2019
DA - 2019/04/30
SP - 47-51
IS - 09
VL - 07
SN - 2347-2693
ER -
With the technology growing exponentially, there are a lot of researches and inventions taking place in all the fields. New innovations and discoveries are put forth in the form of research papers. There are thousands of research papers today that pertain to different disciplines such as Computer Science, Mathematics, Biology, Chemistry etc. Finding papers pertaining to a specific domain is time consuming and a tedious task. Classification of papers to a specific discipline, subject or a category reduces the task of searching. This task if done manually consumes lot of human effort and time where as if done automatically, saves the time of users by preventing them from going through the entire research paper. The proposed work uses a novel strategy to automatically classify the research papers based by analyzing the structure of abstracts of research papers to assign them to a specific predefined discipline. Machine Learning technique is used to build a learning model to learn the properties or characteristics of documents manually, in some cases semi automatically, so that the more it gets trained the more efficient will be the model to predict or classify the test documents. Support Vector Machine (SVM) algorithm is used to vectorize the training data set and plot them in an n-dimensional space and then to find the hyper plane that will separate the data into a predefined category. The data is then learnt and later used to categorize the data. The performance of SVM is compared with Naïve Bayes and Decision Tree algorithms also. The experimental result proves the outstanding performance of SVM to predict the category of research papers over the other two algorithms mentioned above. The main objective of the proposed work is to develop a system that has the ability to learn from a training set of data, improvise from the experiences without explicitly programming for it and later classify any research paper given to it into a discipline.
Key-Words / Index Term
Support Vector Machine (SVM), Bag-Of-Words (BOW), Machine Learning (ML)
[1] Junfei Qiu, Qihui Wu, Guoru Ding , Yuhua Xu and Shuo Feng, “A survey of Machine Learning for big data processing”, Qiu et al. EURASIP Journal on Advances in Signal Processing (2016) 2016:67, DOI 10.1186/s13634-016-0355-x.
[2] A Sandryhaila, JMF Moura, “Big data analysis with signal processing on graphs: representation and processing of massive data sets with irregular structure”, IEEE Signal Proc Mag 31(5), 80–90 (2014).
[3] Diksha Khurana, Aditya Koli, Kiran Khatter, Sukhdev Singh, “Natural Language Processing: State of The Art, Current Trends and Challenges”,, August 17, 2017.
[4] D. Saidulu, Dr. R. Sasikala, “Machine Learning and Statistical Approaches for Big Data: Issues, Challenges and Research Directions”, International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 21 (2017) pp. 11691-11699.
[5] Sheetal Shimpikar, Sharvari Govilkar, “A Survey of Text Summarization Techniques for Indian Regional Languages”, International Journal of Computer Applications (0975 – 8887), Volume 165 – No.11, May 2017.
[6] Vladimir Nasteski, “An overview of the supervised machine learning methods”, Journal of advances in information technology, DOI10.20544/HORIZONS.B.04.1.17.P05,UDC 04.85.021:519.718, Jan 4, 2017.
[7] Mowafy M, Rezk A, El-bakry HM, “An Efficient Classification Model for Unstructured Text Document”. American Journal of Computer Science and Information Technology, Vol.6 No.1: 16, ISSN 2349-3917, 2018.
[8] Rajeswari RP, Juliet K, Aradhana, “Text Classification for Student Data Set using Naive Bayes Classifier and KNN Classifier”, International Journal of Computer Trends and Technology, – Volume 43 Number 1 – January 2017.
[9] M. Kepa, J. Szymanski, “Two stage SVM and k-near neighbor text documents classifier”, In the Proceedings of the 6th International Conference, on Pattern Recognition and Machine Intelligence PReMI 2015, Warsaw, Poland, June 30-July 3, 2015, DOI: 10.1007/978-3-319-19941-2_27, pp.279-289.
[10] R. C. Barik and B. Naik, "A Novel Extraction and Classification Technique for Machine Learning using Time Series and Statistical Approach", International Journal of Innovative Research in Computer and Communication Engineering, DOI: 10.15680/IJIRCCE.2018.0605115, Vol. 6, Issue 5, May 2018.
[11] R. Bruni and G. Bianchi, "Effective Classification Using a Small Training Set Based on Discretization and Statistical Analysis", IEEE Transactions On Knowledge And Data Engineering, Vol. 27, No. X, XXXXX 2015.
[12] A. Chaudhuri, "Modified fuzzy support vector machine for credit approval classification," Journal AI Communications, Volume 27 Issue 2, April 2014, Pages 189-211.
[13] E. Baralis, L. Cagliero, and P. Garza, "EnBay: A novel pattern-based Bayesian classifier", IEEE Transactions on Knowledge & Data Engineering, pp. 2780-2795, vol. 25, Dec. 2013.
[14] X. Fang, "Inference-Based Naive Bayes: Turning Naive Bayes Cost-Sensitive", IEEE Transactions on Knowledge and Data Engineering 25(10):2302-2313 • October 2013.
[15] C. H. Wan, L. H. Lee, R. Rajkumar, and D. Isa, "A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine", International Journal of Innovative Research in Science, Engineering and Technology, Volume 3, Special Issue 3, March 2014.
[16] Maganti Syamala, Dr N J Nalini, Lakshamanaphaneendra, Dr. R Ragupathy, “Comparative Analysis of Document level Text Classification Algorithms using R”, IOP Conf. Series: Materials Science and Engineering 225 (2017) 012076 doi:10.1088/1757-899X/225/1/012076.
[17] M. Parchami, B. Akhtar, and M. Dezfoulian, "Persian text classification based on K-NN using wordnet”, book Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface (pp.283-291), DOI: 10.1007/978-3-642-31087-4_30.
[18] Autade Sushma G., Dr.Gayatri M.Bhandari, “Text Categorization based on SVM and Bayesian Classification Approach Using Class-Specific Features”, International Journal of Advanced Research in Computer Engineering & Technology,Volume 06, Issue 06, June 2017, ISSN: 2278 – 1323.