A Comparative Study of Three IR models for Bengali Document Retrieval
Soma Chatterjee1 , Kamal Sarkar2
Section:Research Paper, Product Type: Journal Paper
Volume-07 ,
Issue-01 , Page no. 220-225, Jan-2019
Online published on Jan 20, 2019
Copyright © Soma Chatterjee, Kamal Sarkar . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Style Citation: Soma Chatterjee, Kamal Sarkar, “A Comparative Study of Three IR models for Bengali Document Retrieval,” International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.220-225, 2019.
MLA Style Citation: Soma Chatterjee, Kamal Sarkar "A Comparative Study of Three IR models for Bengali Document Retrieval." International Journal of Computer Sciences and Engineering 07.01 (2019): 220-225.
APA Style Citation: Soma Chatterjee, Kamal Sarkar, (2019). A Comparative Study of Three IR models for Bengali Document Retrieval. International Journal of Computer Sciences and Engineering, 07(01), 220-225.
BibTex Style Citation:
@article{Chatterjee_2019,
author = {Soma Chatterjee, Kamal Sarkar},
title = {A Comparative Study of Three IR models for Bengali Document Retrieval},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {1 2019},
volume = {07},
Issue = {01},
month = {1},
year = {2019},
issn = {2347-2693},
pages = {220-225},
url = {https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=622},
publisher = {IJCSE, Indore, INDIA},
}
RIS Style Citation:
TY - JOUR
UR - https://www.ijcseonline.org/full_spl_paper_view.php?paper_id=622
TI - A Comparative Study of Three IR models for Bengali Document Retrieval
T2 - International Journal of Computer Sciences and Engineering
AU - Soma Chatterjee, Kamal Sarkar
PY - 2019
DA - 2019/01/20
PB - IJCSE, Indore, INDIA
SP - 220-225
IS - 01
VL - 07
SN - 2347-2693
ER -
Abstract
In this paper, we studied and examined some selected information retrieval approaches for Bengali information retrieval. These approaches used keyword to describe the content of each document. We choose three models to understand their working mechanisms and shortcomings. These models are TFIDF Vector Space model, Latent Semantic Indexing (LSI) model, and BM25 model. This understanding is important to overcome these shortcomings. These models are examined on our created Bengali dataset and Bengali queries and the results are stated in the result section in this paper. Our study reveals that Okapi BM25 model performs best among the three IR models studied for Bengali document retrieval.
Key-Words / Index Term
Information Retrieval, Bengali language, LSI, BM25, probabilistic, Query
References
[1] R. Banerjee, & S. Pal, “ISM @ FIRE - 2011: Monolingual Task”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2011). Available at http://www.isical.ac.in/~fire/2011/workingnotes. html (visited May 2015),2011.
[2] U. Barman, P. Lohar, P. Bhaskar, & S. Bandyopadhyay, “ Ad-hoc Information Retrieval focused on Wikipedia based Query Expansion and Entropy Based Ranking” ,Working Notes of the Forum for Information Retrieval Evaluation, Available at http://www.isical.ac.in/~fire/2012/working-notes.html, 2012.
[3] P. Bhaskar, Das, A. Pakra & S. Bandyopadhyay , “Theme Based English and Bengali Ad-hoc Monolingual Information Retrieval in FIRE 2010”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2010), Available at http://www.isical.ac.in/~fire/2010/working_notes.html (visited May 2015), 2010.
[4] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, & R. Harshman, “Indexing by latent semantic analysis”, Journal of the American society for information science, Vol. 41, No. (6), 391. 1990.
[5] L. Dolamic & J. Savoy, “UniNE at FIRE 2008: Hindi, Bengali, and Marathi IR” , In: Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2008). Available at http://www.isical.ac.in/~fire/2008/working_notes.html (visited May 2015) ,2008.
[6] D. Ganguly, J. Leveling, & G. J. F. Jones, “A Case Study in Decompounding for Bengali Information Retrieval. Information Access Evaluation, Multilinguality, Multimodality, and Visualization, Lecture Notes in Computer Science, Vol. 8138, pp. 108-119,2013.
[7] M. Kantrowitz, B. Mohit, & V. Mittal ,“Stemming and Its Effects on TFIDF Ranking” In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece ,pages 357–359, 2000.
[8] W. Kraaij & R. Pohlmann, “Viewing stemming as recall enhancement” In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM ,pp. 40-48,1996.
[9] P. J. Loponen, , & K. Jarvelin, “UTA Stemming and Lemmatization Experiments in the Bengali ad hoc Track at FIRE 2010”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2010). Available at http://www.isical.ac.in/~fire/2010/working_notes.html (visited May 2015), 2010.
[10] P. Majumdar, M. Mitra,S.K. Parui & G. Kole, “YASS: Yet Another Suffix Stripper”, ACM Transactions on Information Systems, Vol. 25 , No.4, Article 18,2007.
[11] R. Marcus, “Computer and Human Understanding in Intelligent Retrieval Assistance”, American Society for Information Science, 28, 1998.
[12] P. McNamee, “N-gram Tokenization for Indian Language Text Retrieval”, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2008), Available at http://www.isical.ac.in/~fire/2008/working_notes.html (visited May 2015), 2008.
[13] J. H. Paik & S. K. Parui, “A Simple Stemmer for Inflectional Language” , In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2008), Available at http://www.isical.ac.in/~fire/2008/working_notes.html (visited May 2015), 2008.
[14] S. .E. Robertson, “The probability ranking principle in IR”, Journal of Documentation, 33, 294-304, 1977.
[15] G. Salton, A. Wong & C. S. Yang, “A vector space model for automatic indexing” , Communications of the ACM, Vol.18, No.11, PP.613-620, 1975.
[16] K. Sarkar & A. Gupta, “An Empirical Study of Some Selected IR Models for Bengali Monolingual Information Retrieval”, In Proceedings of ICBIM, NIT, Durgapur, 2016.
[17] K. Jones Spärck , S. Walker & S. E. Robertson, “A probabilistic model of information retrieval Development and comparative experiments”, IP&M, Vol. 36, No. 6, pp.779–808, 809–840.
[18] H. Turtle & W. Bruce Croft, “Inference networks for document retrieval”, InProc. SIGIR, pp. 1–24, 1989
[19] H. Turtle & W. Bruce Croft, “Evaluation of an inference network-based Retrieval model”, TOIS ,Vol.9, No. 3, pp.187–222, 1991.
[20] C. J. Van Rijsbergen, “Information Retrieval”, 2nd edition, Butterworths, LONDON, 1979.
[21] A. Singhal and F. Pereira, “Document expansion for speech retrieval” , In procedding of ACM SIGIR, Berkeley, CA, USA, pages 223-232,1999.
[22] M. Berry, S. Dumais and G. W. O’Brien, “Using linear algebra for intelligent information retrieval, SIAM Review, pp.573-595, 1995.
[23] D.R Radev, H. Jing, M. Sty´s, and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management,Vol. 40, No. 6,pp.919–938, 2004.
[24] S. Chatterjee & K. Sarkar, Combining “IR Models for Bengali Information Retrieval”, International Journal of Information Retrieval Research (IJIRR), vol.8 issue 3 article 5, pp.68-83, 2017.