Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River
D. A. Lingote1 , Girish S. Katkar2
Section:Research Paper, Product Type: Journal Paper
Volume-7 ,
Issue-3 , Page no. 550-556, Mar-2019
CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i3.550556
Online published on Mar 31, 2019
Copyright © D. A. Lingote, Girish S. Katkar . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Citation
IEEE Style Citation: D. A. Lingote, Girish S. Katkar, “Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.3, pp.550-556, 2019.
MLA Citation
MLA Style Citation: D. A. Lingote, Girish S. Katkar "Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River." International Journal of Computer Sciences and Engineering 7.3 (2019): 550-556.
APA Citation
APA Style Citation: D. A. Lingote, Girish S. Katkar, (2019). Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River. International Journal of Computer Sciences and Engineering, 7(3), 550-556.
BibTex Citation
BibTex Style Citation:
@article{Lingote_2019,
author = {D. A. Lingote, Girish S. Katkar},
title = {Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2019},
volume = {7},
Issue = {3},
month = {3},
year = {2019},
issn = {2347-2693},
pages = {550-556},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3878},
doi = {https://doi.org/10.26438/ijcse/v7i3.550556}
publisher = {IJCSE, Indore, INDIA},
}
RIS Citation
RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i3.550556}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3878
TI - Interactive Data Extraction Algorithm to Extract Data from The Pdf Document, Helpful in Generating Water Quality Data of The Kanhan River
T2 - International Journal of Computer Sciences and Engineering
AU - D. A. Lingote, Girish S. Katkar
PY - 2019
DA - 2019/03/31
PB - IJCSE, Indore, INDIA
SP - 550-556
IS - 3
VL - 7
SN - 2347-2693
ER -
![]() |
![]() |
![]() |
590 | 551 downloads | 222 downloads |




Abstract
Now a day’s internet is very popular and widely used for information generation and broadcasting. If current trend is observed, then most of the organization/labs/institute uses “PDF” (Portable Document Format) document to release their official/research report. PDF document has many benefits, hence popularly used for publishing information on the web. if this widely published information extracted and re-processed then this information can be useful inputs for many research and development projects. In this research paper we introduced information extraction algorithm, which extracts information from the pdf document using free libraries. To be specific, we have targeted PDF documents comprising Kanhan River water quality data, which is freely published over the internet. To present this information beautifully, extracted information is geo-mapped and re-published in the public domain which helps in observing and validating Kanhan River water quality data at different geographical locations.
Key-Words / Index Term
PDF Extraction, data generation, Extraction, Kanhan River, information system
References
[1] Dr. G. K. Khadse, P. M. Patni, P.S. Kelkar, S. Devotta, "Qualitative evaluation of Kanhan River and its tributaries flowing over central Indian plateau", Environ Monit Assess. 2008 Dec; 147 (1-3):83-92. Epub 2007 Dec 22.
[2] Margaret H. Dunham, “Data Mining Introductory & Advanced Topics”, Pearson Education
[3] Dinesh A. Lingote1*, Girish S. Katkar2, Ritesh Vijay 3, R. B. Biniwale4, "Responsive Information generation system for Kanhan River, an effective information system for river modeling", International Journal of Computer Science and Engineering (IJCSE, E-ISSN: 2347-2693), Vol.-6, Issue-12, Dec 2018
[4] Library org.apache.pdfbox.* is attributed as it is used for reading PDF document.
[5] Mehrdad Jalali, Norwati Mustapha et al,” A Recommender System Approach for Classifying User Navigation Patterns Using Longest Common Subsequence Algorithm”, American Journal of Scientific Research ISSN 1450-223X Issue 4 (2009), pp 17-27
[6] K. A. Smith and A. Ng, Web page clustering using a self-organizing map of user navigation patterns, Decision Support Syst. 35(2) (2003) 245–256
[7] Nacim Fateh Chikhi, Bernard Rothenburger, Nathalie Aussenac-Gilles “A Comparison of Dimensionality Reduction Techniques for Web Structure Mining”, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, P.116-119 ,2007
[8] Poonam Devi, "Attacks on Cloud Data: A Big Security Issue", International Journal of Scientific Research in Network Security and Communication, Volume-6, Issue-2, April 2018
[9] P.V. Nikam, D.S. Deshpande, "Different Approaches for Frequent Itemset Mining", International Journal of Scientific Research in computer science and Engineering, Vol.6, Issue.2, pp. 10-14, April (2018)