Open Access   Article Go Back

Big data Processing Comparison using Pig and Hive

J. Santosh Kumar1 , B. K. Raghavendra2 , S. Raghavendra3

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-3 , Page no. 173-178, Mar-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i3.173178

Online published on Mar 31, 2019

Copyright © J. Santosh Kumar, B. K. Raghavendra, S. Raghavendra . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: J. Santosh Kumar, B. K. Raghavendra, S. Raghavendra, “Big data Processing Comparison using Pig and Hive,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.3, pp.173-178, 2019.

MLA Style Citation: J. Santosh Kumar, B. K. Raghavendra, S. Raghavendra "Big data Processing Comparison using Pig and Hive." International Journal of Computer Sciences and Engineering 7.3 (2019): 173-178.

APA Style Citation: J. Santosh Kumar, B. K. Raghavendra, S. Raghavendra, (2019). Big data Processing Comparison using Pig and Hive. International Journal of Computer Sciences and Engineering, 7(3), 173-178.

BibTex Style Citation:
@article{Kumar_2019,
author = {J. Santosh Kumar, B. K. Raghavendra, S. Raghavendra},
title = {Big data Processing Comparison using Pig and Hive},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2019},
volume = {7},
Issue = {3},
month = {3},
year = {2019},
issn = {2347-2693},
pages = {173-178},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3815},
doi = {https://doi.org/10.26438/ijcse/v7i3.173178}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i3.173178}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3815
TI - Big data Processing Comparison using Pig and Hive
T2 - International Journal of Computer Sciences and Engineering
AU - J. Santosh Kumar, B. K. Raghavendra, S. Raghavendra
PY - 2019
DA - 2019/03/31
PB - IJCSE, Indore, INDIA
SP - 173-178
IS - 3
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
659 599 downloads 249 downloads
  
  
           

Abstract

Big data is not only about mammoth volume of data along with volume velocity i.e. data generating speed like more than a speed of cheetah and also verity of data like a verity of vegetables in market, which we cannot process using our traditional system, processing is nothing but storing and analyzing the generated huge amount of verity of streaming and non- streaming data. Around us each and every device generates huge amount of structured and unstructured data. From many years many devices and organizations generates the data, generated data is not used by organizations for many years, now a day’s organizations thinking of using the generated data for analysis and enhance the performance of organizations. Different data generation sources generate variety of data, i.e. Not of same in nature variety of data like structured whose features (fields) and features types are known, semi structured whose features types are unknown but features are known and unstructured whose features types and features are not known. To process big data Hadoop is developed by Benn cutting of yahoo later enhanced by google and amazon. Now amazon is number one company in the world because of analyzing the generated data. To process big data many tools and software frame work have been developed by many companies like Amazon, Google and Yahoo. Hadoop basically had two components like HDFS and Map Reduce one for storing and other one for processing, later stages YARN is added as recourse manager, before Yarn HDFS takes care of Recourse management which leads poor performance so YARN additional frame work added on top of Hadoop to manage recourse, along with Yarn later stages many other components like H-base-Hive, Sqoop are added to process only structured data and to process unstructured data. Pig and Flume are added to process unstructured data. Main work of Sqoop is to import and export structured data from database to Hadoop and vice versa. whereas flume is to import unstructured data generated from web server, twitter and face-book to Hadoop for analysis. The ecosystem of recent Hadoop are H-base, PIG, hive, Zoo-keeper, Oozie, flume, mahout machine learning tool and many more to make user friendly and to improve the performance of data analysis. Similar spark and flink are also competitors of hadoop spark which overcome limitations of Hadoop and flink which overcome the limitations of spark. In this we wanted to highlight the map- reduce applications for word-count bench mark examples, in our research we executed the bench mark word count program using pig and hive and achieved hive is much faster than PIG.

Key-Words / Index Term

Hadoop;Map-Reduce;Hive;Pig;wordcount;cloudxlab;flink;spark

References

[1] Jorge Veiga, Roberto R. Expósito et al. “Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics” 2016 IEEE International Conference on Big Data (Big Data)
[2] Md. Armanur Rahman 1 , J. Hossen “A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance” International Journal of Electrical and Computer Engineering (IJECE) Vol. 8, No. 3, June 2018, pp. 1854-1862
[3] AmanLodha , “Hadoop’s Optimization Framework for Map Reduce Clusters “ Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-4, 2017
[4] Dan Wang, JiangchuanLiu , “Optimizing Big Data Processing Performance in the Public Cloud: Opportunities and Approaches” IEEE Network • September/October 2015
[5] A. K. M. MahbubulHossen1, A. B. M. Moniruzzaman et. al. “Performance Evaluation of Hadoop and Oracle Platform for Distributed Parallel Processing in Big Data Environments” International Journal of Database Theory and Application Vol.8, No.5 (2015), pp.15-26
[6] ChangqingJi,Yu Li, WenmingQiu et.al. “Big Data Processing in Cloud Computing environments “International Symposium on Pervasive Systems, Algorithms and Networks. 2012
[7] Bogdan Ghiţet. al. “Towards an Optimized Big Data Processing System” 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, 2013
[8] Kyong-Ha Lee et. al. “Parallel Data Processing with Map Reduce: A Survey” SIGMOD Record, December 2011 (Vol. 40, No. 4)
[9] JaliyaEkanayake and Geoffrey Fox “High Performance Parallel Computing with Clouds and Cloud Technologies” International Conference on Cloud Computing, 2009 – Springer
[10] Ashlesha S. Nagdive et al, “Overview on Performance Testing Approach in Big Data“ International Journal of Advanced Research in Computer Science, 5 (8), Nov–Dec, 2014, pp165-169.