Text Classification: A Comparative Analysis of Word Embedding Algorithms
R. Janani1 , S. Vijayarani2
Section:Research Paper, Product Type: Journal Paper
Volume-7 ,
Issue-4 , Page no. 818-822, Apr-2019
CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i4.818822
Online published on Apr 30, 2019
Copyright © R. Janani, S. Vijayarani . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Style Citation: R. Janani, S. Vijayarani, “Text Classification: A Comparative Analysis of Word Embedding Algorithms,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.4, pp.818-822, 2019.
MLA Style Citation: R. Janani, S. Vijayarani "Text Classification: A Comparative Analysis of Word Embedding Algorithms." International Journal of Computer Sciences and Engineering 7.4 (2019): 818-822.
APA Style Citation: R. Janani, S. Vijayarani, (2019). Text Classification: A Comparative Analysis of Word Embedding Algorithms. International Journal of Computer Sciences and Engineering, 7(4), 818-822.
BibTex Style Citation:
@article{Janani_2019,
author = {R. Janani, S. Vijayarani},
title = {Text Classification: A Comparative Analysis of Word Embedding Algorithms},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {4 2019},
volume = {7},
Issue = {4},
month = {4},
year = {2019},
issn = {2347-2693},
pages = {818-822},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=4123},
doi = {https://doi.org/10.26438/ijcse/v7i4.818822}
publisher = {IJCSE, Indore, INDIA},
}
RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i4.818822}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=4123
TI - Text Classification: A Comparative Analysis of Word Embedding Algorithms
T2 - International Journal of Computer Sciences and Engineering
AU - R. Janani, S. Vijayarani
PY - 2019
DA - 2019/04/30
PB - IJCSE, Indore, INDIA
SP - 818-822
IS - 4
VL - 7
SN - 2347-2693
ER -
VIEWS | XML | |
534 | 407 downloads | 165 downloads |
Abstract
Text classification is the task of allocating the documents into one or more number of predefined categories. In general, this technique is used in the field of information retrieval, text summarization and, text extraction. To perform the classification task, transformation of text into feature vectors is the important stage. The main advantage of this transformation is to discover the most significant words from the document. This process is also known as word embedding, which is used to represent the meaning of words into vector format. The word embedding’s are employed in a high dimensional space where the embeddings of similar or related words are adjacent to each other. This main aim of this research work is to classify the text documents based on their contents. In order to achieve this task, in this research work the different word embedding algorithms are used to represent documents. The performance measures are Precision, recall, f-measure and accuracy.
Key-Words / Index Term
Text Classification, Document Representation, Word Embedding, Word2Vec, GloVe, WordRank
References
[1]. Korde, V., & Mahender, C. N. (2012). Text classification and classifiers: A survey. International Journal of Artificial Intelligence & Applications, 3(2), 85.
[2]. Jon Ezeiza Alvarez. (2017). A review of word embedding and document similarity algorithms applied to academic text
[3]. Liu, Q., Huang, H., Gao, Y., Wei, X., Tian, Y., & Liu, L. (2018, August). Task-oriented word embedding for text classification. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 2023-2032).
[4]. Yu, L. C., Wang, J., Lai, K. R., & Zhang, X. (2017, September). Refining word embeddings for sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 534-539).
[5]. Li, L., Qin, B., & Liu, T. (2017). Contradiction detection with contradiction-specific word embedding. Algorithms, 10(2), 59.
[6]. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137-1155.
[7]. Bollegala, D., Alsuhaibani, M., Maehara, T., & Kawarabayashi, K. I. (2016, March). Joint word representation learning using a corpus and a semantic lexicon. In Thirtieth AAAI Conference on Artificial Intelligence.
[8]. Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166.
[9]. Diaz, F., Mitra, B., & Craswell, N. (2016). Query expansion with locally-trained word embeddings. arXiv preprint arXiv:1605.07891.
[10]. Zamani, H., & Croft, W. B. (2017, August). Relevance-based word embedding. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 505-514). ACM.
[11]. Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information Processing & Management, 50(1), 104-112.
[12]. Bollegala, D., Yoshida, Y., & Kawarabayashi, K. I. (2018, April). Using k-way Co-occurrences for Learning Word Embeddings. In Thirty-Second AAAI Conference on Artificial Intelligence.
[13]. Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
[14]. Dutta, D. (2018). A Review of Different Word Embeddings for Sentiment Classification using Deep Learning. arXiv preprint arXiv:1807.02471.
[15]. Mandelbaum, A., & Shalev, A. (2016). Word embeddings and their use in sentence classification tasks. arXiv preprint arXiv:1610.08229.
[16]. Rosander, O., & Ahlstrand, J. (2018). Email Classification with Machine Learning and Word Embeddings for Improved Customer Support.