Open Access   Article Go Back

A New Technique of Web Page Classification and Optimization

R Khan1 , R K Gupta2 , V. Namdeo3

Section:Research Paper, Product Type: Journal Paper
Volume-7 , Issue-1 , Page no. 381-385, Jan-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i1.381385

Online published on Jan 31, 2019

Copyright © R Khan, R K Gupta, V. Namdeo . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: R Khan, R K Gupta, V. Namdeo, “A New Technique of Web Page Classification and Optimization,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.1, pp.381-385, 2019.

MLA Style Citation: R Khan, R K Gupta, V. Namdeo "A New Technique of Web Page Classification and Optimization." International Journal of Computer Sciences and Engineering 7.1 (2019): 381-385.

APA Style Citation: R Khan, R K Gupta, V. Namdeo, (2019). A New Technique of Web Page Classification and Optimization. International Journal of Computer Sciences and Engineering, 7(1), 381-385.

BibTex Style Citation:
@article{Khan_2019,
author = {R Khan, R K Gupta, V. Namdeo},
title = {A New Technique of Web Page Classification and Optimization},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {1 2019},
volume = {7},
Issue = {1},
month = {1},
year = {2019},
issn = {2347-2693},
pages = {381-385},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3516},
doi = {https://doi.org/10.26438/ijcse/v7i1.381385}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i1.381385}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3516
TI - A New Technique of Web Page Classification and Optimization
T2 - International Journal of Computer Sciences and Engineering
AU - R Khan, R K Gupta, V. Namdeo
PY - 2019
DA - 2019/01/31
PB - IJCSE, Indore, INDIA
SP - 381-385
IS - 1
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
408 351 downloads 164 downloads
  
  
           

Abstract

The rapid development of the internet and web publishing techniques create numerous information sources published as HTML pages on World Wide Web. WWW is now a popular medium by which people all around the world can spread and gather the information of all kinds. The importance of these Web-specific features and algorithms, describe the state-of-the-art practices, and the following hypothesis. This work is for a better description of Web page classification problem. Since Firefly Algorithm (FA) is a recent nature inspired optimization algorithm, which simulates the flash patterns and characteristics of fireflies. Clustering is a popular data analysis technique to identify homogeneous groups of objects based on the values of their attributes. Here is used for clustering on benchmarks which is more suitable than Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), and other nine methods used. The webpage optimization using Naïve Bayes classifier is an improved optimized web page classification using firefly algorithm with NB classifier. The inclusion of Naïve Bayes is an expert in the field of firefighting. Current classification techniques use word consistency and grouping techniques for classifying web pages. These Techniques use an ad hoc approach to review and reconcile whole keywords on a website for classification. These methods are effective, but not without problems like slow Processing, word meaning differences, poor identification of sentences also disregard the homonymy of the words. Hence this work is better, in the accuracy, precision, etc. parameters with respect to existing concepts.

Key-Words / Index Term

Accuracy, Artificial Bee, Classification, Clustering, Colony, Firefly, Features, Homogeneous, HTML, Information, Optimization, Precision, Web, etc

References

[1] Guixian Xu ; Ziheng Yu ; Qi Qi, Efficient Sensitive Information Classification and Topic Tracking Based on Tibetan WebPages,IEEE Access, 2018
[2] Ankit Dilip Patel ; Vimal N. Pandya, Web page classification based on context to the content extraction of articles 2nd International Conference for Convergence in Technology (I2CT), 2017
[3] Eldhose P Sim, Classification & detection of near duplicate web pages using five stage algorithm, IEEE, 2015
[4] Guixian Xu ; Chuncheng Xiang ; Xu Gao ; Xiaobing Zhao ; Guosheng Yang, Automatic Classification of Tibetan Web Pages, International Conference on Computer Science and Electronics Engineering, 2012
[5] Jonáš Krutil ; Miloš Kudělka ; Václav Snášel, Web page classification based on Schema.org collection,2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN), 2012
[6] He Youquan ; Xie Jianfang ; Xu Cheng, An improved Naive Bayesian algorithm for Web page text classification, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2011
[7] Boyi Xu ; Jing Wang ; Hongming Cai, A Web page classification algorithm and its application in E-government system, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, 2010
[8] Weitong Huang ; Luxiong Xu ; Yanmin Liu, Preprocessing and Feature Preparation in Chinese Web Page Classification, 2009 International Conference on Computer Engineering and Technology, 2009
[9] Jinbeom Kang ; Joongmin Choi, Block Classification of a Web Page by Using a Combination of Multiple Classifiers, 2008 Fourth International Conference on Networked Computing and Advanced Information Management,2008
[10] Yong Zhang ; Bin Fan ; Long-bin Xiao, Web Page Classification Based on a Least Square Support Vector Machine with Latent Semantic Analysis, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008
[11] Moonis Javed ; Aly Akhtar ; Akif Khan Yusufzai, Classification of Web Pages as Evergreen Or Ephemeral Based on Content, 2015 International Conference on Computational Intelligence and Communication Networks (CICN), 2015
[12] Feiyue Ye ; Zhian Yu, Finding the Semantic Relation between Web Pages through Topic Knowledge Repository, 2009 Ninth IEEE International Conference on Computer and Information Technology, 2009
[13] Chinese Web-page Classification Study, Weitong Huang ; LuXiongXu ; Junfeng Duan ; Yuchang Lu, Chinese Web-page Classification Study, 2007 IEEE International Conference on Control and Automation, 2007
[14] Sumaia Mohammed Al-Ghuribi ; Saleh Alshomrani, A Simple Study of Webpage Text Classification Algorithms for Arabic and English Languages, 2013 International Conference on IT Convergence and Security (ICITCS), 2013
[15] Daya Gupta ; Harsh Tripathi ; Mayukh Maitra, Classifying web hierarchically using multi label tree classifier, 2015 Annual IEEE India Conference (INDICON), 2015
[16] Prabhu, Yashoteja, Manik Varma, FastXML: a fast accurate and stable tree-classifier for extreme multilabel learning, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014
[17] E. Lee, J. Kang, J. Choi, and J. Yang., Topic-specific web content adaptation to mobile devices,e 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pages 845-848. IEEE Computer Society, 2006