Open Access   Article Go Back

Parts of Speech Tagging for Indic Languages: A Survey

Floyd Avina Fernandes1 , Kavita Asnani2

Section:Survey Paper, Product Type: Journal Paper
Volume-7 , Issue-3 , Page no. 729-736, Mar-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i3.729736

Online published on Mar 31, 2019

Copyright © Floyd Avina Fernandes, Kavita Asnani . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Floyd Avina Fernandes, Kavita Asnani, “Parts of Speech Tagging for Indic Languages: A Survey,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.3, pp.729-736, 2019.

MLA Style Citation: Floyd Avina Fernandes, Kavita Asnani "Parts of Speech Tagging for Indic Languages: A Survey." International Journal of Computer Sciences and Engineering 7.3 (2019): 729-736.

APA Style Citation: Floyd Avina Fernandes, Kavita Asnani, (2019). Parts of Speech Tagging for Indic Languages: A Survey. International Journal of Computer Sciences and Engineering, 7(3), 729-736.

BibTex Style Citation:
@article{Fernandes_2019,
author = {Floyd Avina Fernandes, Kavita Asnani},
title = {Parts of Speech Tagging for Indic Languages: A Survey},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2019},
volume = {7},
Issue = {3},
month = {3},
year = {2019},
issn = {2347-2693},
pages = {729-736},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3908},
doi = {https://doi.org/10.26438/ijcse/v7i3.729736}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i3.729736}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3908
TI - Parts of Speech Tagging for Indic Languages: A Survey
T2 - International Journal of Computer Sciences and Engineering
AU - Floyd Avina Fernandes, Kavita Asnani
PY - 2019
DA - 2019/03/31
PB - IJCSE, Indore, INDIA
SP - 729-736
IS - 3
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
311 228 downloads 104 downloads
  
  
           

Abstract

Natural language processing (NLP) comprises of various techniques addressing language text. Few to mention are Part of Speech (POS) tagger, Chunker, Morphological Analyzer, Spell-Checkers, Grammar Checkers, Machine translator, Transliterator etc. POS tagging is the basic building block in language processing which assigns part of Speech (POS) tag which is a peculiar label assigned to each and every token (word) in a text corpus to indicate the part of speech such as verb, pronoun, noun, adjective etc. POS tagging is useful and significant in pre-processing phase especially in the area of information retrieval, text to speech processing, word sense disambiguation and information processing. The methods of POS tagging are classified as rule-based POS tagging, transformation-based tagging, and stochastic tagging. Recent research reports various methods and approaches like Markov Model (MM), SVM (Support Vector Machine), ME (Maximum Entropy) etc used for POS tagging tested on several Indic languages like Hindi, Bengali, Manipuri, Assamese, Telugu, Kannada, Malayalam, Tamil, Punjabi. Since the performance of POS taggers is specific to context and language, there is a pressing need to carry out exhaustive survey. . This paper highlights a comprehensive study on two indic languages i.e. Hindi and Bengali. POS taggers with various approaches along with performance are reported.

Key-Words / Index Term

Ambiguity, Natural language processing (NLP), Named Entity Recognition (NER), Part of Speech(POS), Tagger

References

[1] Shambhavi.B.R, Dr.R. Kumar P., “Current state of the art POS tagging For Indian Languages-A Study”, International Journal of Computer Engineering and Technology, Vol.1, No.1, pp.250-260, 2010.
[2] R. Kaur, L.S. Garcha, Dr.M.Garag,S. Singh, “ Parts of Speech Tagging for Indian Languages Review and Scope for Punjabi Language”, International Journal of Advanced Research in Computer Science and Software Engineering, April, Vol.7, Issue.4, pp.214-217, 2017.
[3] S. Singh, K. Gupta, M. Shrivastava, P. Bhattacharyya, “Morphological Richness Offsets Resource Demand – Experiences in Constructing a POS Tagger for Hindi”, In the Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, July, pp.779–786, 2006.
[4] A. Himashu, A. Anirudh, “Part of Speech Tagging and Chunking with Conditional Random Fields”, In the Proceedings of the NLPAI Contest, 2006.
[5] A. Dalal, K. Nagaraj, U. Sawant , S. Shelke, “Hindi Part-of Speech Tagging and Chunking: A Maximum Entropy Approach”, In the Proceedings of the NLPAI Contest, 2006.
[6] M. Shrivastava, P. Bhattacharyya, “Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information without Extensive Linguistic Knowledge”, In the Proceedings of ICON-2008: 6th International Conference on Natural Language Processing, Pune, India, and December, 2008.
[7] D. Kumar, G. Singh J. , “ Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey”, International Journal of Computer Applications (0975 – 8887), September ,Vol. 6, No.5, pp.1-9, 2010.
[8] N. Mishra, A. Mishra, “Part of Speech Tagging for Hindi Corpus”, In the Proceedings of 2011 International Conference on Communication Systems and Network Technologies, IEEE, pp. 554-558, 2011.
[9] S. Mall, U.C. Jaiswal, “Hindi Part of Speech Tagging and Translation”, Int. J. Tech. 2011,Vol. 1, Issue.1, pp.29-32, 2011.
[10] N. Garg, V. Goyal, S. Preet, “Rule Based Hindi Part of Speech Tagger”, In the Proceedings of COLING 2012: Demonstration Papers, Mumbai, December, pp.163–174, 2012.
[11] K. Sarkar, V. Gayen, “A Practical Part-of-Speech Tagger for Bengali”, In the Proceedings of 2012 Third International Conference on Emerging Applications of Information Technology (EAIT), 2012.
[12] S. Dandapat, S. Sarkar, A. Basu, “Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario” , In the Proceedings of ACL 2007 Demo and Poster Sessions, Prague, June, pp.221–224, 2007.
[13] A. Ekbal, S. Bandyopadhyay, “Bengali Named Entity Recognition using Support Vector Machine”, In the Proceedings of IJCNLP-08 Workshop on NER for South and South East Asian Languages, Hyderabad, India, January, pp.51–58, 2008.
[14] H. Ali, “An Unsupervised Parts-of-Speech Tagger for the Bangla language”, Department of Computer Science, University of British Columbia , 2010.
[15] Antony P J, Dr. Soman K P, “Parts Of Speech Tagging for Indian Languages:A Literature Survey”, International Journal of Computer Applications , 0975 – 8887, November ,Vol. 34, No. 8, pp-22-29,2011.
[16] K. Mohnot, N. Bansal, S. Pal Singh, A. Kumar, “Hybrid approach for Part of Speech Tagger for Hindi language” , International Journal of Computer Technology and Electronics Engineering (IJCTEE), Vol. 4, Issue. 1, February ,pp.25-30, 2014.
[17] M. Kaur, M. Aggerwal, S. Kumar Sharma, “Improving Punjabi Part of Speech Tagger by Using Reduced Tag Set”, International Journal of Computer Applications & Information Technology, Vol. 7, Issue. II Dec 14- January, pp.142-148, 2015.
[18] A. Ekbal, et. al, “Bengali part of speech tagging using conditional random field” in Proceedings of the 7th International Symposium of Natural Language Processing( SNLP-2007), Pattaya, Thailand December, pp. 131-136, 2007.
[19] P. R K Rao T, V. S. Ram R, Vijayakrishna R , Sobha L “A Text Chunker and Hybrid POS Tagger for Indian Languages”, In the Proceedings of IJCNLP-08 Workshop on NER for South and South East Asian Languages, Hyderabad, India, January, 2007.
[20] Md. F. Kabir, K. Abdullah-Al-Mamun, M. N. Huda, “Deep Learning Based Parts of Speech Tagger for Bengali”, In the Proceedings of 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV) ,IEEE, pp.26–29, 2016.
[21] M. M. Yoonus , S. Sinha, “A hybrid pos tagger for indian languages.” Language in India, Vol. 11, No. 9, 2011.
[22] A. Ekbal and M. Hasanuzzaman, “Voted approach for part of speech tagging in bengali.” , 2009.
[23] S. Mukherjee, S. Das Mandal, “Bengali parts-of-speech tagging using global linear model” in India Conference (INDICON), 2013 Annual IEEE, Dec, pp. 1–4, 2013.
[24] G.M. Ravi Sastry, S. Chaudhuri, P. N. Reddy, “An HMM based Part-Of-Speech tagger and statistical chunker for 3 Indian languages”, In the Proceedings of IJCNLP-08 Workshop on NER for South and South East Asian Languages, Hyderabad, India, January, pp. 13-16, 2007.
[25] D. Rao, D. Yarowsky, “Part of Speech Tagging and Shallow Parsing of Indian Languages”, In the Proceedings of IJCNLP-08 Workshop on NER for South and South East Asian Languages, Hyderabad, India, January, pp. 17–20, 2007.
[26] P. Rutravigneshwaran, “A Study of Intrusion Detection System using Efficient Data Mining Techniques”, Int. J. Sc. Res. in Network Security and Communication IJSRNSC, December , Vol. 5, Issue. 6, pp.5-8, 2017.
[27] Avinesh.PVS, Karthik G, “Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning”, In the Proceedings of IJCNLP-08 Workshop on NER for South and South East Asian Languages, Hyderabad, India, January, pp. 21-24, 2007.
[28] N. Joshi , H. Darbari, I. Mathur, “Hmm based pos tagger for Hindi”, In the Proceedings of International Conference on Artificial Intelligence Soft Computing , pp. 341–349, 2013.
[29] P. Awasthi, D.Rao, B.Ravindran, “Part of Speech Tagging and Chunking with HMM and CRF”, In the Proceedings of NLPAI MLcontest workshop, National Workshop on Artificial Intelligence, 2006.
[30] G. Kaur, K. Kaur, “Sentiment Detection from Punjabi Text using Support Vector Machine”, International Journal of Scientific Research in Computer Science and Engineering, December, Vol. 5, Issue. 6, pp.39-46, 2017.
[31] A. Ekbal, S. Mandal, S. Bandyopadhyay, “POS Tagging Using HMM and Rule-based Chunking”, In the Proceedings of Workshop on shallow parsing in South Asian languages, pp. 25-28, 2007.
[32] V. Khicha, M. Manna, “Part-of-Speech Tagging of Hindi Language Using Hybrid Approach”, International Journal of Engineering Technology Science and Research IJETSR, Vol. 4, Issue. 8 , pp. 737–741, 2017.
[33] R. Narayan, V. P. Singh , S. Chakraverty, “Quantum Neural Network based Parts of Speech Tagger for Hindi”, International Journal of Advancements in Technology, July, Vol. 5, No. 2, pp. 137-152, 2014.