Web Server log Analysis for Unstructured data Using Apache Flume and Pig
A.S. Nagdive1 , R.M. Tugnayat2 , G.B Regulwar3 , D.Petkar 4
Section:Research Paper, Product Type: Journal Paper
Volume-7 ,
Issue-3 , Page no. 220-225, Mar-2019
CrossRef-DOI: https://doi.org/10.26438/ijcse/v7i3.220225
Online published on Mar 31, 2019
Copyright © A.S. Nagdive, R.M. Tugnayat, G.B Regulwar, D.Petkar . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
View this paper at Google Scholar | DPI Digital Library
How to Cite this Paper
- IEEE Citation
- MLA Citation
- APA Citation
- BibTex Citation
- RIS Citation
IEEE Style Citation: A.S. Nagdive, R.M. Tugnayat, G.B Regulwar, D.Petkar, “Web Server log Analysis for Unstructured data Using Apache Flume and Pig,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.3, pp.220-225, 2019.
MLA Style Citation: A.S. Nagdive, R.M. Tugnayat, G.B Regulwar, D.Petkar "Web Server log Analysis for Unstructured data Using Apache Flume and Pig." International Journal of Computer Sciences and Engineering 7.3 (2019): 220-225.
APA Style Citation: A.S. Nagdive, R.M. Tugnayat, G.B Regulwar, D.Petkar, (2019). Web Server log Analysis for Unstructured data Using Apache Flume and Pig. International Journal of Computer Sciences and Engineering, 7(3), 220-225.
BibTex Style Citation:
@article{Nagdive_2019,
author = {A.S. Nagdive, R.M. Tugnayat, G.B Regulwar, D.Petkar},
title = {Web Server log Analysis for Unstructured data Using Apache Flume and Pig},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2019},
volume = {7},
Issue = {3},
month = {3},
year = {2019},
issn = {2347-2693},
pages = {220-225},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3821},
doi = {https://doi.org/10.26438/ijcse/v7i3.220225}
publisher = {IJCSE, Indore, INDIA},
}
RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i3.220225}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3821
TI - Web Server log Analysis for Unstructured data Using Apache Flume and Pig
T2 - International Journal of Computer Sciences and Engineering
AU - A.S. Nagdive, R.M. Tugnayat, G.B Regulwar, D.Petkar
PY - 2019
DA - 2019/03/31
PB - IJCSE, Indore, INDIA
SP - 220-225
IS - 3
VL - 7
SN - 2347-2693
ER -
VIEWS | XML | |
635 | 363 downloads | 192 downloads |
Abstract
Web server normally produces log files. A weblog is a group of connected web pages that consists of a log or daily record of information, particular fields or views which is altered, every now and then, by owner of site, other websites or by website users. This is used to convert the unstructured data of web server log which will be coming from Apache flume into structured format using Pig. An enterprise weblog analysis system based on Hadoop architecture with Hadoop Distributed File System (HDFS), Hadoop MapReduce Software Framework and Pig Latin Language aids the business decision-making process of the system administrators and helps them to collect and identify the potential value which is hidden within such huge data generated by the websites. Such a weblog analysis includes the analysis of an Internet site’s entry log as well as provides information about the amount of visitors, days of week and rush hours, views, hits, very often accessed pages, application server traffic trends, performance reports at varying intervals and statistical reports which indicate the performance of program. Web log file is a log file created and stored by a web server automatically. Analyzing such web server access logs files will provide us various insights about website usage. Due to high usage of web, the log files are growing at much faster rate with increase in size. Processing this fast growing log files using relational database technology has been a challenging task these days. Hadoop runs the big data where a massive quantity of information is processed via cluster of commodity hardware. In this paper we present the methodology used in pre-processing of high volume web log files, studying the statics of website and learning the user behaviour using the architecture of Hadoop MapReduce framework, Hadoop Distributed File System, and HiveQL query language Pig.
Key-Words / Index Term
HDFS, Apache Flume, Pig , Hbase, web log server
References
[1] Babak Yadranjiaghdam, Nathan Pool, Nasseh Tabrizi, “A Survey on Real-time Big Data Analytics: Applications and Tool”, 2016 International Conference on Computational Science and Computational Intelligence
[2] P. Muthulakshmi1, S. Udhayapriya , “A Survey on Big Data Issues and Challenges”,International Journal of Computer Sciences and Engineering, Vol.-6, Issue-6, Jun 2018 E-ISSN: 2347-2693
[3] SayaleeNarkhede and TriptiBaraskar, “HMR Log Analyzer: Analyze Web Application Logs over HadoopMapReduce,” International Journal of UbiComp (IJU) vol.4, No.3, July 2013.
[4] Mirghani. A. Eltahir ; Anour F. A. Dafa-Alla,” Extracting knowlede from web server logs using web usage minning”, Published in: 2013 International Conference On Computing, Electrical And Electronic Engineering (Icceee)
[5] https://en.wikipedia.org/wiki/Apache_Hadoop
[6] Dr.S.Suguna, M.Vithya,J.I.ChristyEunaicy, “Big Data Analysis in E-commerce System Using HadoopMapReduce”in 2016 IEEE.
[7] G.S.Katkar, A.D.Kasliwal, “Use of Log Data for Predictive Analytics through Data Mining”, Current Trends in Technology and Science, ISSN: 2279-0535. Volume: 3, Issue: 3(Apr-May 2014).
[8] Savitha K, Vijaya M S, “Mining of web server logs in a distributed cluster using big data technologies”, International Journal of Advanced Computer Science and Applications, Vol.5, NO.1, 2014
[9] Mahendra Pratap Yadav ; Pankaj Kumar Keserwani ; Shefalika Ghosh Samaddar, “An Efficient Web Mining Algorithm for Web Log Analysis: E-Web Miner” 2012 1st International Conference on Recent Advances in Information Technology (RAIT)
[10] Xianjun Ni, “Design and Implementation of Web log Minning” International conference of computer engineering and technology 2009
[11] Apache-Hadoop,http://www.hadoop.apache.org