Effectuation of Web Log Preprocessing and Page Access Frequency using Web Usage Mining
Research Paper | Journal Paper
Vol.1 , Issue.1 , pp.1-5, Sep-2013
Abstract
For accessing the information from web log, this is very important task and this task can be accomplished by web usage mining technique. Through web usage mining technique we can find out visitors behavior which can automatically and very fast access intrinsic information from huge amount of web log data, such as interesting access path, identify the user, accessing the web page group, web user clustering and web pre-fetching. Web usage mining is milestone for decision making process for an organization. Data preprocessing is very important concepts for the mining process. If our web log data is preprocessed then we can easily find out the desire information about visitor and also retrieve other hidden information from web log data. In this paper we focus on data preprocessing technique of web usage mining, after completion of data preprocessing, any king of irrelevant information can be sort out. We have also proposed an algorithm and its implementation for web log preprocessing in web usage mining. Every page has been assigned with an individual token. According to this token and frequency, data mining technique (Classification, Association Rules, and Clustering) can be applied. In this article we can easily find the highest and lowest value according to page access frequency.
Key-Words / Index Term
Web Usage Mining, Preprocessing, Web Log Data, Frequency, Clustering
References
[1] Theint Theint Aye, "Web Log Cleaning of Web Usage Patterns," IEEE, 2011.
[2] Ms.Dipa Dixit and Ms. M. Kiruthika, "Preprocessing of Web Logs," International Journal on Computer Science and Engineering,vol. 02, 2010.
[3] Arshi Shamsi, Rahul Nayak, Pankaj Pratap Singh and Mahesh Kumar Tiwari , "Web Usage Mining by Data Preprocessing," IJCST, vol. 3, 2012.
[4] Mahendra Pratap Yadav,Pankaj Kumar Keserwani and Shefalika Ghosh Samaddar, "An Efficient Web Mining Algorithm for Web Log Analysis: E-Web Miner," IEEE, 2012.
[5] Shaimaa Ezzat Salama, Mohamed I. Marie, "Web Server Logs preprocessing for Web Intrusion Detection," Computer and Information Science, vol. 4, 2011.
[6] Jaideep Srivastava, Robert Cooley, Mukund Deshpande and Pang-Ning Tan, "Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data," SIGKDD Explorations, vol. 1, 2000.
[7] Liu Kewen, "Analysis of Preprocessing Methods for Web Usage Data," International Conference on Measurement , Information and control(MIC),IEEE,2012.
[8] R. Cooley,B. Mobasher and J Shrivastava, "Web Mining:information and pattern discoveryon the World Wide web," Ninth International Conference, 2011.
[9] Web Log Data, "http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html,".
[10] Zhuang Like, Kou Zhongbao and Zhang Changshui, "Session identification based on time intervals in Web log mining," Journal of Tsinghua University (Science and Technology), 2005.
[11] N. Zhang and W. F. Lu, " An Efficient Data Preprocessing Method for Mining Customer Survey Data," IEEE, 2007.
[12] Tasawar Hussain, Dr. Sohail Asghar, Dr. Nayyer Masood, " Web Usage Mining: A Survey on Preprocessing of Web Log File," IEEE, 2010.
[13] T. Murata and K. Saito, "Extracting Users` Interests from Web Log Data," Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings, 2006.
[14] Ling Zheng , Hui Gui and Feng Li, "Optimized Data Preprocessing Technology for Web Log Mining," International Conference On Computer Design And Appliations ICCDA, 2010.
[15] R. Cooley, B. Mobasher and J. Srivastava, "Data preparation for mining world wide web browsing patterns," Knowledge and Information System, 1999.
[16] Brijesh Bakariya and G.S.Thakur, "Preprocessing on Web Log Data in Web Usage Mining," International Conference on Intelligent Computing and Information System ICICIS, 2012.
[17] Thi Thanh Sang Nguyen, Hai Yan Lu and Jie Lu, "Web-page Recommendation based on Web Usage and Domain Knowledge," IEEE, 2013.
Citation
B. Bakariya, G.S. Thakur, "Effectuation of Web Log Preprocessing and Page Access Frequency using Web Usage Mining," International Journal of Computer Sciences and Engineering, Vol.1, Issue.1, pp.1-5, 2013.
Clustering approach based on Efficient Coverage with Minimum Weight for Document Data
Research Paper | Journal Paper
Vol.1 , Issue.1 , pp.6-13, Sep-2013
Abstract
At present time huge amount of useful data is available on web for access, and this huge amount of data is shared information which can be used by anyone intended to use. The availability of different types and nature of document data has lead to the task of clustering in large dataset. Clustering is one of the very important techniques used for classification of large dataset and widely applicable many areas. High-quality and fast document clustering algorithms play a significant role to successfully navigate, summarize and organize the information. Recent studies have shown that partitional clustering algorithms are suit- able for large datasets. The k-means algorithm [9, 10] is generally used as partitional clustering algorithm because it can be easily implemented and is most efficient in terms of execution time. The major problem with this algorithm is its sensitivity in selection of the initial partition and its convergence to local optima. In this research study we have refined the useful information from document data set using minimum spanning tree for document clustering and good quality of clusters have been generated on several document datasets, and the output show obtained indicates effective improvement in performance.
Key-Words / Index Term
Minimum Spanning Tree, Document Clustering, World Wide Web, K-Means Algorithm
References
[1] A. Vathy-Fogarassy, A. Kiss, and J. Abonyi , �Hybrid Minimal Spanning Tree and Mixture of Gaussians Based Clustering Algorithms�, Proceeding. IEEE International Conferance Tools with Artificial Intelligence, pp 73-81, 2006.
[2] Andreas C. Muller, S. Nowozin, christoph H. Lampert, �Information theoretic clustering using minimum spanning tree� Pattern Recognition, pp. 205-215, 2012.
[3] Bhaskar Adepu, K.K. bejjanki, �A Novel Approach for Minimum Spanning Tree based Clustering Algorithm�
[4] B. Eswara Reddy, K. Rajendra Prasad, �reducing runtime values in minimum spanning tree based clustering by visual access tendency� International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.2, No.3, pp 11-22, May 2012.
[5] C. Zahn. �Graph-theoretical methods for detecting and describing gestalt clusters�. IEEE Transactions on Computers, C-20:pp. 68-86, 1971.
[6] Chang, J., Luo, J., Huang, J.Z., Feng, S., Fan, J.: Minimum spanning tree based classification model for massive data with mapreduce implementation. In: Fan, W., Hsu, W., Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) ICDM Workshops,. IEEE Computer Society pp. 129�137, 2010.
[7] Congnan Luoa, Yanjun Lib, Soon M. Chungc, �Text document clustering based on neighbours� Data & Knowledge Engineering Volume 68, Issue 11, Pages 1271�1288, November 2009.
[8] D.S Rajput, R.S. Thakur, G.S. Thakur �Rule Generation from Textual Data by using Graph Based Approach�, International Journal of Computer Application (IJCA) 0975 � 8887, New york USA, ISBN: 978-93-80865-11-8, Vol. 31� No.9,pp. 36-43 , October 2011.
[9] D. S. Rajput, R. S. Thakur, G. S. Thakur ,Neeraj Sahu, � Analysis of Social Networking Sites Using K- Mean Clustering Algorithm�, International Journal of Computer & Communication Technology (IJCCT) ISSN (ONLINE): 2231 - 0371 ISSN (PRINT): 0975 �7449 Vol-3, Iss-3, pp. 88-92, 2012.
[10] Han I and Kamber M, �Data Mining concepts and Techniques,� M. K. Publishers, pp.335�389, 2000.
[11] Jiaxiang Lin, Dongyi Ye, Chongcheng Chen, Miaoxian Gao, �Minimum Spanning Tree Based Spatial Outlier Mining and Its Applications�, Third International Conference, RSKT 2008, Chengdu, China, May 17-19,. pp 508-515, 2008.
[12] J. Zhang and N. Wang, �Detecting outlying subspaces for high-dimensional data: the new task, Algorithms and Performance�, Knowledge and Information Systems, 10(3):pp. 333-555, 2006.
[13] Lijuan Zhou , Linshuang Wang ; Xuebin Ge ; Qian Shi , �A clustering-Based KNN improved algorithm CLKNN for text classification�, Informatics in Control, Automation and Robotics (CAR), 2nd International Asia Conference on Vol.- 3
pp: 212 � 215, 2010.
[14] M. Laszlo and S. Mukherjee, �Minimum Spanning Tree Partitioning Algorithm for Micro aggregation�, IEEE Transaction, Knowledge and Data Engineering, Vol. 17, no 7, pp 902-911, July 2005.
[15] O. Grygorash, Y. Zhou, Z. Jorgensen, �Minimum spanning tree based clustering algorithm�, in Proceeding of the 18th International Conference on Tools with Artificial Intelligence, pp. 73�81, 2006.
[16] Piotr Juszczak, David M.J. Taxa, Elżbieta Pe�kalskab, Robert P.W. Duina, �Minimum spanning tree based one-class classifier �Advances in Machine Learning and Computational Intelligence, Volume 72, Issues 7�9, , pp. 1859�1869, March 2009.
[17] P.Sampurnima, J Srinivas & Harikrishna, �Performance of Improved Minimum Spanning Tree Based on Clustering Technique� Global Journal of Computer Science and Technology Software & Data Engineering, ISSN: 0975-4172 Volume 12 Issue 13 pp 16-22, 2012.
[18] Vathy-Fogarassy , A.Kiss, J.Abnoyi,�Hybrid Minimal Spanning tree based clustering and mixture of Gaussians based clustering algorithm�, Foundations of Information and Knowledge systems, Springer, pp 313-330, 2006.
[19] William B. March, Parikshit Ram, Alexander G. Gray �Fast Euclidean minimum spanning tree: algorithm, analysis, and applications� In proceeding of: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010.
[20] Y.Xu, V.Olman and D.Xu. �Minimum spanning trees for gene expression data clustering�. Genome Informatics, 12: pp 24-33, 2001.
Citation
D.S. Rajput, R.S. Thakur, G.S. Thakur, "Clustering approach based on Efficient Coverage with Minimum Weight for Document Data," International Journal of Computer Sciences and Engineering, Vol.1, Issue.1, pp.6-13, 2013.
Broadband CMOS LNA Design and Performance Evaluation
Research Paper | Journal Paper
Vol.1 , Issue.1 , pp.14-19, Sep-2013
Abstract
This paper presents the design of a broadband Low Noise Amplifier (LNA). The work presented by the structure is capable of operating on a frequency band stretching from 0.8GHz to 2.5GHz i.e. covering most of the high speed data applications. Common source LNA design is offered in this paper. The design is implemented using 0.18�m CMOS process with a supply voltage of 1.3V. The Cascade LNA achieves 12.8 dB (min.) gain, 0.44 dB (min.) NF over operating frequency spectrum upholding high degree of stability factor. LNA presented in this paper consumes 19mW of power.
Key-Words / Index Term
Broadband, LNA, Common Source, RF
References
[1] Thomas H. Lee, The Design of CMOS Radio Frequency Integrated Circuits, Cambridge, U.K. : Cambridge Univ. Press, 1998.
[2] E. A. Sobhy, A. A. Helmy, Sebestian Hoyos, Kamran Entesari and E. S. Sinencio, �A 2.8 mW sub 2dB Noise Figure Inductorless Wideband CMOS LNA employing multiple feedback,� IEEE Transactions on Microwave Theory and Techniques, 2011.
[3] Giuseppina Sapone and Giuseppe Palmisano, �A 3-10 GHz Low Power CMOS Low Noise Amplifier for Ultra Wideband Communication,� IEEE Transations on Microwave Theory and Techniques, vol 59, No. 3, March 2011.
[4] Ali Meaamar, C. C. Boon, Kiat Seng Yeo and Manh Anh Do, �A Wideband Low Power Low Noise Amplifier in CMOS Technology,� IEEE Transactions on Circuits and Systems � I: Regular Papers, vol. 57, No. 4, April 2010.
[5] Che-Cheng Liu, Mei-Fen Chou, Che-Sheng Chen, Wen-An Tsou and Kuei-Ann Wen, �A Broadband Low Noise Ampifier with �0.09dB Noise Flatness Using Active Input Matching.
[6] Heng Zhang, Xiaohua Fan and Edgar S�nchez Sinencio, �A Low-Power, Linearized, Ultra-Wideband LNA design Technique,� IEEE Journal of Solid-Stare Circits, vol. 44, No. 2, Feb 2009.
[7] D. K. Sheffer and T. H. Lee, �Corrections to A 1.5V, 1.5GHz CMOS low noise amplifier,� IEEE Journal of Solid State Circuits, vol 40, pp.1397-1398 , June 2005.
[8] Che-Cheng Liu, Mei-Fen Chou, Che-Sheng Chen, Wen-An Tsou and Kuei-Ann Wen, �A broadband low noise amplifier with � 0.09dB noise flatness using active input matching�, IEEE conference on Electrical Engineering/Electronics computer telecommunications and IT, pp 557-560, May 2010.
[9] Zhang Hao, Deng Quing, Liu Haitao, Xie Shushan, Zhi Qunil and Wang Zhigong, �A 0.1�8.5 GHz wideband CMOS LNA using forward body bias technology for SDR applications�, IEEE conference on Millimeter and microwave technology,2012, vol. 3, pp 1-4, May2012.
[10] Khatri R, Mishra D. K. and Jain P., �A Low Power Low Noise Amplifier for Ultra Wideband Applications�, IEEE conference on Communication systems and Network Technologies, pp 600-605, May 2012.
[11] Chen H.K., Chang D.C., Juang Y.Z., and Lu S.S., �A Compact Wideband CMOS Low-Noise Amplifier Using Shunt Resistive-Feedback and Series Inductive- Peaking Techniques�, IEEE Microwave and Wireless Components Letters, vol. 17, no. 8, August 2007, pp. 616-618.
[12] Youming Zhang, Xusheng Tang and Dawei Zhao, �A 0.7�9GHz CMOS broadband high-gain low noise amplifier for multi-band use�, IEEE international conference on Microwave and Millimeter technology, May 2012.
[13] Joung Won Park and B. Razavi, �A Harmonic Rejecting CMOS LNA for broadband radios�, IEEE symposium on Electronic circuits, June,2012.
[14] Ahmed M. El-Gabaly and Carlos E. Saavedra, �Broadband Low-Noise Amplifier With Fast Power Switching for 3.1�10.6-GHz Ultra-Wideband Applications�, IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 59, NO. 12, pp3146-3153, DECEMBER 2011.
Citation
M.B. Thacker, S.S. Bhoyar, P.K. Rahangdale, "Broadband CMOS LNA Design and Performance Evaluation," International Journal of Computer Sciences and Engineering, Vol.1, Issue.1, pp.14-19, 2013.
Data Mining as a Solution for Data Management in Banking Sector
Research Paper | Journal Paper
Vol.1 , Issue.1 , pp.20-25, Sep-2013
Abstract
The IT revolution is one of the biggest achievements of mankind. The use of information technology is very much prominent in all organizations irrespective of their size. Banking sector is one of those sectors which has incorporated the use of IT in all kinds of operations being done by them. The size of data bases in banks is increasing at such an alarming rate that banks are not able to manage the data bases and hence are unable to use the data for proper decision making. Some of the leading banks around the globe have started using the latest techniques of data mining for customer segmentation, while some banks are still at the planning stage for the implementation of the same. The primary objective of this paper is to investigate the problems faced by the banks while handling the data and to check whether the banks which are using data mining technology are better in this regard as compared to the banks which are not using data mining techniques.
Key-Words / Index Term
Data Mining, Decision Making, Fraud Detection, Customer Relationship Management
References
[1]. Danziger, J. N., & Andersen, K. V. (2002). �The Impacts of Information Technology on Public Administration: An Analysis of Empirical Research From The �Golden Age� of Transformation�. International Journal of Public Administration, 25(5), 591-627.
[2]. Engler, H., & Essinger, J. (2000). The future of banking. UK: Reuters, Pearson Education.
[3]. H. R. Nemati and C. D. Barko(2003)�Key Factors for Achieving Organizational Data Mining Success�. Industrial Management and Data Systems, vol. 103, no. 4, pp. 282�292.
[4]. Hedelin, L., & Allwood, C. M. (2002). �IT and strategic decision making�. Industrial Management + Data Systems, 102(3/4), 125.
[5]. A. Vasudevan(1999) Report of the committee on technology up-gradation in banking sector� Reserve Bank of India, May 1999, chapter 6.
[6]. Mudit Saxena(2000), Vice-President (Retail Marketing), HDFC Bank, Press Statement, Indian Express Newspaper on 9th Nov, 2000.
[7]. Foster, D. P. and Stine, R. A(2004) �Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy�. Journal of the American Statistical Association, Alexandria,Vol. 99, pp. 303-313.
[8]. Rajanish Dass (2006), Data Mining In Banking And Finance: A Note For Bankers- Technical note, Note No.: CISG88., April 2006
[9]. M. Purna Chandar, Arijit Laha and P. Radha Krishna
[10]. Modeling Churn behavior of bank Customers using Predictive Data Mining Techniques", National Conference on Soft Computing Techniques for Engineering Applications, (SCT−2006), Published by Institute for Development & Research in Banking Technology [IDRBT]
[11]. Chowdari Prasad(2007), Commercial Banking to Convenience Banking. Faculty Coloumn, IndianMBA.com.
[12]. Madan Lal Bhasin (2006), "Data Mining: A Competitive Tool in the Banking and Retail Industries�, The Chartered Accountant October, 2006.
Citation
V. Bhambri, "Data Mining as a Solution for Data Management in Banking Sector," International Journal of Computer Sciences and Engineering, Vol.1, Issue.1, pp.20-25, 2013.
A Clustering Framework for Large Document Datasets
Research Paper | Journal Paper
Vol.1 , Issue.1 , pp.26-30, Sep-2013
Abstract
Document set is the collection of different types of document. Each document contains special type of information, which is beneficial for the peoples. We have the need of document clustering by their similarity. Document may contain data related to the blogs, website access pattern, any transaction or simply text. By the clustering of similar documents one can find the future trends of the people and it is also useful for the business point of view. In this paper, we have proposed a clustering approach for large size document sets. This proposed approach immediately assign document into appropriate cluster. Experiments are conducted with the twenty newsgroup dataset using java and MATLAB software. Comparisons are also performed with the existing methods. Experimental results show the effectiveness of the proposed approach for large document sets.
Key-Words / Index Term
Large Document Set, Similarity measurement, Term Extraction, Dendrogram
References
[1] Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE, Survey of Clustering Algorithms, IEEE Transactions on Neural Networks Vol. 16, No. 3, May 2005.
[2] Bidyut kr. Patra,Sukumar Nandi,P.Viswanath, A distance based clustering method for arbitrary shaped clusters in large datasets,Pattern Recognition 44(2011) 2862-2870.
[3] M. Anderberg, Cluster Analysis for Applications. New York: Academic,1973.
[4] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. NewYork: Wiley, 2001.
[5] Jin Chen, Alan M. MacEachren, and Donna J. Peuquet, �Constructing Overview + Detail Dendrogram-Matrix Views �, IEEE Transactions on Visualization and Computer Graphics, Vol .15, No.6 ,Nov 2009.
[6] B. Duran and P. Odell, Cluster Analysis: A Survey. New York:Springer-Verlag, 1974.
[7] B. Everitt, S. Landau, and M. Leese, Cluster Analysis. London: Arnold, 2001.
[8] P. Hansen and B. Jaumard, �Cluster analysis and Math- ematical programming,� Math. Program., vol. 79, pp. 191�215, 1997.
[9] A. Jain and R. Dubes, Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice-Hall, 1988.
[10] E. Backer and A. Jain, �A clustering performance measure based on fuzzy set decomposition,� IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-3, no. 1, pp. 66�75, Jan. 1981.
[11] C. Bishop, Neural Networks for Pattern Recognition. New York: Oxford Univ. Press, 1995.
[12] V. Cherkassky and F. Mulier, Learning From Data: Concepts, Theory, and Methods. New York: Wiley, 1998.
[13] A. Baraldi and E. Alpaydin, �Constructive feedforward ART
[14] clustering networks�Part I and II,� IEEE Trans. Neural Netw., vol. 13, no. 3, pp. 645�677, May 2002.
[15] M. Steinbach, G.Karypis, V.Kumar, A Comparison of document clustering techniques, Proc. Of the 6th ACM SIGKDD int�l conf. on Knowledge Discovery and Data Mining(KDD), 2000.
[16] P.Willet, Recent trends in hierarchical document clustering: a critical review, Information processing & Management 24(5) (1988), pp 577-597.
[17] Ghanshyam Thakur, Rekha Thakur and R.C. Jain, �Association Rule Generation from Textual Document� International Journal of Soft Computing, 2: 2007 pp. 346-348.
[18] M. Dash, H.Liu, P. Scheuermann, K.L. Tan, fast hierarchical clustering and its validation, Data & Knowledge Engineering 44(1) (2003) pp. 109-138.
[19] R. Balaji And R.B. Bapat, Block Distance Matrices, Electronic Journal of Linear Algebra ISSN 1081-3810 A publication of the International Linear Algebra Society Volume 16, pp. 435-443, December 2007.
[20] M.Nanni, speeding-up hierarchical agglomerative clustering in presence of expensive metrics, in proc. Of Ninth Pacific-Asia conference on knowledge discovery and Data mining (PAKDD)2005, pp. 378-387.
[21] P.A.Vijaya, M.N.Murty, D.K. Subramanian, Efficient bottom up hybrid hierarchical clustering techniques for protein sequence classification, pattern Recognition 39 (12) (2006), pp.2344-2355.
Citation
K.K. Mohbey, G.S. Thakur , "A Clustering Framework for Large Document Datasets," International Journal of Computer Sciences and Engineering, Vol.1, Issue.1, pp.26-30, 2013.