Analysis of Different Classifiers’ Performance After Applying Three Different Feature Selection Methods
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.1-11, Jan-2019
Abstract
Feature selection (FS) is an important aspect of data mining. Now a days availability of information with hundreds of variables leads to high dimensional, irrelevant and redundant data. Thus FS techniques must be applied on the datasets before classification or rule generation. It basically aims at reducing the number of attributes by removing irrelevant or redundant ones, while trying to reduce computation time and improve performance of classifiers. In this paper three different FS methods are used, Correlation Based, Information Gain Based and Rough set Based FS method. A statistical analysis of three different classifier`s performance is also done in order to provide a detailed view.
Key-Words / Index Term
Data Mining (DM), Feature Selection (FS), Rough Set, Degree of Dependency, Decision Tree (J48 algorithm), Naive Bayes Algorithm (NB), K-Nearest Neighbor Algorithm (KNN), Classification, Statistical Analysis
References
[1] Imran Fareed Nizami, Muhammad Majid, Hammad Afzal and Khawar Khurshi, “Impact of Feature Selection Algorithms on Blind Image Quality Assessment”, Arabian Journal for Science and Engineering, pp 1–14, August 2017.
[2] Abdullah S. Ghareb, Abdul Razak Hamdan and Azuraliza Abu Bakar, “Integrating Noun-Based Feature Ranking and Selection Methods with Arabic Text Associative Classification Approach”, Arabian Journal for Science and Engineering, Vol.39, Issue.11, pp 7807–7822, November 2014.
[3] Z. Pawlak, Rough sets, International Journal of Computer and Information Sciences, 11, 341-356, 1982
[4] Javad Rahimipour Anaraki, Kerman, Iran, Mahdi Eftekhari, “Rough Set Based Feature Selection: A Review”, 5th Conference on Information and Knowledge Technology, IEEE, 2013.
[5] G. K. Gupta, “Introduction to Data Mining with Case Studies”, Prentice Hall of India New Delhi, 2006.
[6] P-N. Tan, M. Steinbach, V. Kumar, “Introduction to Data Mining”, Addison Wesley Publishing, 2006.
[7] O.Maimon and L.Rokach, “Data Mining and Knowledge Discovery”, Springer Science and Business Media, 2005.
[8] X. Niuniu and L. Yuxun, “Review of Decision Trees”, IEEE, 2010.
[9] Payam Emami Khoonsari and AhmadReza Motie, “A Comparison of Efficiency and Robustness of ID3 and C4.5 Algorithms Using Dynamic Test and Training Data Sets”, International Journal of Machine Learning and Computing, Vol.2, Issue.5, October 2012.
[10] V. Garcia, C. Debreuve, “Fast k Nearest Neighbor Search using GPU”, IEEE, 2008.
[11] A. Ashari I. Paryudi and A Min Tjoa, “Performance Comparison between Naïve Bayes Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool”, International Journal of Advanced Computer Science and Applications, Vol.4, Issue. 11, 2013.
[12] Dougherty, J., R. Kohavi and M. Sahami, “Supervised and unsupervised discretization of continuous features”, Proceeding of the 12th International Conference on Machine Learning, 1995.
[13] https:// archive.ics.uci.edu/ ml/ datasets/ Diabetic + Retinopathy + Debrecen + Data + Set
[14] https:// archive.ics.uci.edu/ ml/ datasets/ EEG + Eye + State
[15] https:// archive.ics.uci.edu/ ml/ datasets/ cardiotocography
[16] https://archive.ics.uci.edu/ ml/ datasets/ Thoracic + Surgery + Data
[17] PIDD Dataset, https:// archive.ics.uci.edu/ ml/ datasets/ pima + indians + diabetes
[18] https:// archive.ics.uci.edu/ ml/ datasets/ ILPD + (Indian + Liver + Patient + Dataset)
[19] https:// archive.ics.uci.edu/ ml/ datasets/ breast + cancer + wisconsin + (original)
Citation
Kasturi Ghosh, Susmita Nandi, "Analysis of Different Classifiers’ Performance After Applying Three Different Feature Selection Methods", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.1-11, 2019.
An Unified Framework for Verifying Reverse Conversion Algorithms for Signed – Digit Number Systems
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.12-15, Jan-2019
Abstract
For signed–digit number systems, the proof of correctness of some reverse conversion algorithms remain ingrained in their developmental methods. As far as the remaining reverse conversion algorithms are concerned, either of no formal proof is given or different verification techniques are followed. Consequently at some points of time even some faulty algorithms were proposed through recognized platforms and sought to be projected as breakthroughs. In this paper, it is shown that a majority of available reverse conversion algorithms may be directly verified in generic form in a concise manner, merely using an ordinary mathematical induction technique, guided by the fundamental rules of typical reverse conversion. In addition to that, as evident from the case study which is presented in this paper, both the verification process and outcomes of reverse conversion, using the former class of algorithms, may be easily used to scrutinize the other more complex reverse conversion algorithms. Thus a unified framework for verifying reverse conversion algorithms for signed – digit number systems might have been obtained.
Key-Words / Index Term
Reverse Conversion Algorithms, Signed – Digit Number Systems, Verification of Correctness, Unified Framework
References
[1] M. S. Chakraborty, “Reverse Conversion Schemes for Signed - Digit Number Systems: A survey”, Journal of Institute of Engineers (India): Series B, Vol. 97, pp. 589-593, 2016.
[2] I. Koren, “Computer Arithmetic Algorithms”, CRC Press, Oxford, 2001.
[3] M. S. Chakraborty, S. K. Sao, A. C. Mondal, “Equivalence of Reverse Conversion of Binary Signed-Digit Number System and Two’s - Complement to Canonical Signed-Digit Recording”, In the Proceedings of the 2018 IEEE International Conference on Recent Advancements in Information Technology, IIT/ ISM, Dhanbad, India, pp. 662 – 666, 2018.
[4] M. S. Chakraborty and A. C. Mondal, “Reverse conversion of signed - digit number systems: transforming radix - complement output”, Indonesian Journal of Electrical Engineering and Computer Science, Vol. 4, pp. 665 – 669, 2016.
[5] M. S. Chakraborty, A. C. Mondal, S. K. Sao, “Towards Relating Some Methods of Signed – Digit Arithmetic”, In the Proceedings of the 2018 International Conference on Mathematics, St. Thomas College, Thrissur, India, pp. 47 – 54, 2018.
[6] R. K. Barik, M. Pradhan, R. Panda, “Efficient conversion technique from redundant binary to non redundant binary representation”, Journal of Circuits, Systems and Computers, Vol. 26, pp. 1750135-1-1750135-18, 2017.
[7] S. S. Tripathy, R. K. Barik, M. Pradhan, “An Improved Conversion Circuit for Redundant Binary to Conventional Binary Representation”, In the Proceedings of the 2017 International Conference on Computational Intelligence, Communications and Business Analytics (CICBA), Kolkara, India, pp. 363-371, 2017.
[8] S. K. Sahoo, A. Gupta, A. R. Asati, C. Shekhar, “A novel redundant binary number to natural binary number converter”, Journal of Signal Processing Systems, Vol. 59, pp. 297-307, 2010.
[9] G. Wang and M. P. Tull, "A new Redundant Binary Number to two`s complement Number Converter", In the Proceedings of the 2004 IEEE Region 5 Conference, Norman, USA, pp. 141 – 143, 2004.
[10] G. A. Ruiz, "4bit CLA-based conversion from redundant to binary representation for CMOS simple and multi-output implementations", Electronics Letters, Vol. 35, No. 4, pp. 281-283, 1999.
[11] S. Yen, C. Laih, C. Chen and J. Lee, "An Efficient Redundant - Binary Number to Binary Number Converter", IEEE Journal of Solid-State Circuits, Vol. 27, No. 1, pp. 109-112, 1992.
[12] T. Stouraitis, C. Chen, “Fast Digit - Parallel Conversion of Signed Digit into Conventional Representations”, Electronics Letters, Vol. 27, No. 11, pp. 964 – 965, 1991.
[13] Y. He, C.-H. Chang, “A power-delay efficient hybrid carry-lookahead/ carry-select based redundant binary to two’s-complement converter”, IEEE Transactions on Circuits and Systems - I, Regular Papers, Vol. 55, pp. 336-346, 2008.
[14] S.-H. Shieh, C-W Wu, “Asymmetric High-Radix Signed-Digit Number Systems for Carry-Free Addition”, Journal of Information Science and Engineering, Vol.19, pp. 1015-1039, 2003.
[15] I. Choo and R. G. Deshmukh, “A novel conversion scheme from a redundant binary number to two’s complement binary number for parallel architectures”, In the Proceedings of the 2001 IEEE SoutheastCon, Clemson, USA, pp. 196 – 207, 2001.
[16] S. Veeramachaneni, M. K. Krishna, L. Avinash, S. Reddy, M. B. Srinivas, “High - speed redundant binary to binary converter using prefix networks”, in the Proceedings of the 2007 IEEE International Symposium on Circuits and Systems, Hyderabad, India, pp. 3271 – 3274, 2007.
[17] Y. Kim, B.-.S. Song, J. Grosspietsch and S. F. Gilling, "A carry-free 54×54 b multiplier using equivalent bit conversion algorithm", IEEE journal of Solid-State Circuits, Vol. 36, No. 10, pp. 1538-1544, 2001.
[18] H. R. Srinivas, K. K. Parhi, “High – Speed VLSI Arithmetic Processor Architectures using Hybrid Number Representation”, Journal of VLSI Signal Processing, Vol. 4, Issue 2 – 3, pp. 177 – 198, 1992.
[19] V. Charoensiri, A. Surarerks, “On-the-fly Conversion from Signed-Digit Number System into Complement Representation”, In the Proceedings of the 2006 IEEE International Symposium on Communications and Information Technologies, Bangkok, Thailand, pp. 1056-1061, 2006.
[20] H. R. Srinivas and K. K. Parhi, "A Fast VLSI Adder Architecture", IEEE Journal of Solid-State Circuits, Vol. 27, No. 5, pp. 761-767, 1992.
[21] W. Rülling, "A Remark on Carry - Free Binary Multiplication", IEEE Journal of Solid - State Circuits, Vol. 38, No. 1, pp. 159-160, 2003.
[22] M. S. Chakraborty, T. Ghosh, A. C. Mondal, “Towards a General Proof for the Correctness of Various Reverse Conversion Algorithms for Binary Signed – Digit Number System”, Orally Presented in International Seminar on Quality of Teaching – Learning in Higher Education in India: Concerns and Challenges, organized by the Education Department, Bankura University, Bankura, WB, India, 2018.
[23] M. S. Chakraborty, “Reverse conversion of signed - digit number system: Fast transformation sign – magnitude output”, International Journal of Computer Sciences and Enginering, Vol. 6, Issue 5, pp. 454 – 457, 2018.
Citation
M.S. Chakraborty, T. Ghosh, A.C. Mondal, "An Unified Framework for Verifying Reverse Conversion Algorithms for Signed – Digit Number Systems", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.12-15, 2019.
Fuzzy Implication on Mutation for Uncertain Paths
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.16-19, Jan-2019
Abstract
In real, network analysis have numerous uncertainties in the communication path. Uncertainties affect the possibility of mutation in the path due to the fuzziness of the interplay, error rate and environmental factors. The said paper works out the fuzziness in the mutation of path for the uncertain factors. It quantifies the performance of the mutation as well as the genetic algorithm. It provides fuzzy model to deals with the factors affecting the uncertainty of the shorter paths. Fuzzy implications are considered for different factors. The new edges formed during mutation are also dealt with fuzzy concept. This is a robust metaheuristic algorithm that defines the shorter path problem from a different plane using genetic and fuzzy tool. The transfer of packet is delayed due to natural and human issues like, disruption, natural calamity, and human error during installation etc. The work provides alternative paths to reach to the destination with the uncertain factors.
Key-Words / Index Term
Genetic Algorithm, Mutation, Fuzzy Logic, Fuzzy Implications and Shorter path
References
[1] Tarak N Paul and Abhoy C Mondal, “Amalgamation of Graphs do not Affect the Search of the Set of Shorter Paths Algorithm”, International Journal of Advanced Computer & Research, Volume-4, Number-1, Issue-14, pp. 184 – 192, 2014, ISSN 22497277
[2] Tarak N Paul and Abhoy C Mondal, “Search the Set of Shorter Paths Using Graph Reduction Technique”, International Journal of Advanced Computer & Research, Volume-3, Number-4, Issue-13, pp. 278 – 287, December 2013, ISSN 22497277.
[3] Renu Kumari and Tarak N Paul, “Optimization of Wired Genetic Antenna Using Genetic Algorithm”, IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661, p-ISSN: 2278-8727, Volume 19, Issue 1, Ver. I (Jan.-Feb, 2017).
[4] Tarak N Paul and Abhoy C Mondal, “Mutation in Path for the Packets in the Network During Journey form Source to Destination”, International Conference on Advance Computational and Communication Paradigms-08-10 Sept. 2017, Springer, SMIT, Sikkim.
[5] Tarak N Paul and Abhoy C Mondal, “Set of Shorter Paths and Network Delay Deduction Using Genetic Algorithm”, Page 346-351, IEEE ISACC-2015, Assam University, ISBN: 978-1-4673-6707-3.
[6] Tarak N Paul and Abhoy C Mondal, “CONSTITUTE A SUB-GRAPH WITH n-levels TO SEARCH A SET OF SHORTER PATHS USING GENETIC ALGORITHM”, IEEE International Conference on Intelligent Computing and Intelligent System, Guangdong University, China, 3-0096-10943, Volume 3, pp 452-456, November 2011, ISBN 97816128441427, IEEE Catalog CFP1157HPRT.
[7] Tarak N Paul and Abhoy C Mondal, “Intelligent Traversing Method Using Genetic Algorithm For Long Distance”, International Conference on Computing and System, pp 69-74, November 2010, ISBN 9380813015.
[8] Tarak N Paul and Abhoy C Mondal, “Combination of Graph Does Not Affect the Execution of Finding Group of Shorter Path Algorithm”, Second International Conference on Computing and Systems, pp 348 - 356, September, 2013, ISBN-13:9789351342735.
[9] Tarak N Paul and Abhoy C Mondal, “Studies on Shortest Path Using Genetic Algorithm” Lambert Academic Publishing, Germany, ISSN No: 9783659913266.
[10] Tarak N PaulandAbhoy C Mondal, “Mutation in Path for the Packets in the Network During Journey form Source to Destination”, Advanced Computational and Communication Paradigms, Chapter – 27, page – 279-288, AISC Vol-706, ISBN 978-981-10-8237-5. © Springer Nature Singapore Pte Ltd. 2018.
[11] “Network Routing Protocol using Genetic Algorithms”, International Journal of Electrical & Computer Sciences, GihanNagib and Wahied G. Ali, IJECS-IJENS Vol:10 No:02.
Citation
T. N. Paul, "Fuzzy Implication on Mutation for Uncertain Paths", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.16-19, 2019.
An automatic identification of function words in TDIL tagged Bengali corpus
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.20-27, Jan-2019
Abstract
Function words are quite high in textual information as compared to content words; where dimensionality is a critical challenge. Performance of text processing task deteriorates due to the presence of the function words in textual context. So, elimination of these words is an important activity in text processing to reduce the computational complexity and improve accuracy in the system. Many researches are performed for standard function words identification for English, Arabic, Chinese, Punjabi, Hindi, etc. In Bengali language processing, a limited number of standard function words are available. To address this limitation, we propose a computer based automatic system for identification of high scored function words from TDIL tagged Bengali corpus, Govt. of India. Total corpus consists of total 670,831 words and 134,884 distinct words. Our proposed system identifies 8 set of function words i.e. total 33,985 function words are identified in Literature domain of monolingual tagged corpus. At the end of our experiment, we achieved 290 standard function words as per their computed rank.
Key-Words / Index Term
Bengali Text Processing, Function Words, Bag of words, NLP
References
[1] F. Louise, F. Matt, “Text Mining Handbook”, Casualty Actuarial Society E-Forum, CRC Press, pp. 1, 2010.
[2] Ministry of Electronics & Information Technology, Govt. of India, “Technology Development for Indian Languages Programme (TDIL)”, Retrieved from http://www.tdil.meity.gov.in
[3] H. Saif, M. Fernández, Y. He, H. Alani, “On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter”, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Iceland, pp. 810-817, 2014.
[4] R.T.W. Lo, B. He, I. Ounis, “Automatically Building a Stopword list for an Information Retrieval System”, Journal on Digital Information Management, Vol. 3, pp. 3-8, 2005.
[5] W.J. Wilbur, K. Sirotkin, “The Automatic Identification of Stop words”, Journal of information science, Sage Publications Sage CA: Thousand Oaks, CA, Vol. 18, pp. 45–55, Issue.1, 1992.
[6] M. Makrehchi, M.S. Kamel, “Automatic Extraction of Domain-Specific Stopwords from Labelled Documents”, Proceedings of Advances in Information Retrieval, 30th European Conference on {IR} Research, {ECIR}, Glasgow, UK, pp. 222-233, 2008.
[7] Asubiaro, T. Victor, “Entropy-based Generic Stopwords list for Yoruba texts”, International Journal of Computer and Information Technology, Vol. 2, Issue. 5, 2013.
[8] M. Sadeghi, J. Vegas, “Automatic Identification of Light Stop words for Persian information retrieval systems”, Journal of Information Science, Sage Publications Sage, UK, London, England, Vol. 40, pp. 476–487, Issue. 4, 2014.
[9] F. Zou, F.L. Wang, X. Deng, S. Han, L.S. Wang, “Automatic Construction of Chinese Stop Word List”, Proceedings of the 5th WSEAS International Conference on Applied Computer Science, Hangzhou, China, pp. 1010–1015, 2006.
[10] H. Lili, H. Lizhu, “Automatic Identification of Stop words in Chinese Text Classification”, IEEE International Conference on Computer Science and Software Engineering, Vol. 1, pp. 718–722, 2008.
[11] S. Hassan, M. Fernandez, H. Alani, “Automatic Stopword Generation using Contextual Semantics for Sentiment Analysis of Twitter”, Proceedings of the ISWC-2014 Posters and Demonstrations Track a track within the 13th International Semantic Web Conference (ISWC), Riva del Garda, Italy, pp. 281-284, 2014.
[12] Y.Z. Fard, M. Ali, M. Bidgoli, Behrouz, Rahmani, Saeed , Shahrivari, “PSWG: An Automatic Stop-word List Generator for Persian Information Retrieval Systems based on Similarity Function & POS Information”, IEEE 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI), pp. 111–117, 2015.
[13] R. Puri, R.P.S. Bedi, V. Goyal, “Automated Stopwords Identification in Punjabi Documents”, International Journal of Engineering Sciences, Vol. 8, pp. 119–125, 2013.
[14] T. Cover, J.A. Thomas, “Elements of information Theory”, John Wiley & Sons., 2012.
[15] Lin, Jianhua, “Divergence measures based on the Shannon entropy”, IEEE Transactions on Information theory, Vol. 37, pp. 145-151, Issue. 1, 1991.
[16] N. Das, “Indian Scenario in Language Corpus Generation”, Rainbow of linguistics, T. Media Publications, Kolkata, Vol. 1, pp. 129-162, 2007.
[17] G. Salton, A. Wong, C.S. Yang, “A Vector Space Model for Automatic Indexing”, Communications of the ACM, Vol. 18, pp. 613–620, Issue.11, 1975.
[18] Z.S. Harris, “Distributional Structure”, Word: Taylor and Francis. Vol. 10, pp. 146–162, Issue. 2, 1954.
[19] S. Roy, “Bengali Document Ranking”, Github Inc., 2017.
[20] M. Bilenko, R.J. Mooney, “Adaptive Duplicate Detection using Learnable String Similarity Measures”, Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 39–48, 2003.
[21] T. Nayak, “Bengali Stemmer”, Github Inc., 2015.
Citation
Subrata Pan, Diganta Saha, "An automatic identification of function words in TDIL tagged Bengali corpus", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.20-27, 2019.
Hardware Implementation of .....................................m
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.28-32, Jan-2019
Abstract
Error
Key-Words / Index Term
Signal processing, VLSI, FFT, Transforms
References
[1] J. Granata, M. Conner, and R. Tolimieri, “Recursive fast algorithm and the role of the tensor product,” IEEE Transactions on Signal Processing, vol. 40, no. 12, pp. 2921–2930, Dec 1992.
[2] J. R. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri, “A methodology for designing, modifying, and implementing fourier transform algorithms on various architectures,” Circuits, Systems and Signal Processing, vol. 9, no. 4, pp. 449–500, Dec 1990.
[3] D. F. Chiper, “Radix-2 fast algorithm for computing discrete hartley trans- form of type iii,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 59, no. 5, pp. 297–301, May 2012.
[4] D. F. Chiper, “A novel vlsi dht algorithm for a highly modular and parallel archi- tecture,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 60, no. 5, pp. 282–286, May 2013.
[5] M. T. Hamood and S. Boussakta, “Fast walsh-hadamard-fourier transform algorithm,” IEEE Transactions on Signal Processing, vol. 59, no. 11, pp. 5627–5631, Nov 2011.
[6] J. R. Johnson and A. F. Breitzman, “Automatic derivation and implemen- tation of fast convolution algorithms,” Journal of Symbolic Computation, vol. 37, no. 2, pp. 261 – 293, 2004.
[7] M. A. Richard Tolimieri and C. Lu, Algorithms for Discrete Fourier Transform and Convolution. Springer-Verlag New York, 1997.
Citation
Pulak Mazumder, Rajarshi Middya, Mrinal Kanti Naskar, "Hardware Implementation of .....................................m", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.28-32, 2019.
Feature Dimension Reduction Using Euclidean Distance Oriented Similarity Based Rough Set Model
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.33-36, Jan-2019
Abstract
In machine learning, a very high dimensional data reduces the performance of a classifier. To overcome this, a relevant feature dimension reduction algorithm can be applied before applying any classification algorithm. Rough set theory [1] is a very good tool to reduce the feature dimension of an information system or decision system. However, if a decision system contains real-valued data, we cannot apply directly the rough set theory. Various extensions to rough set can be used to handle this kind of data. Among them, Fuzzy-Rough set theory [2], similarity based rough set model [3] are interesting. We propose an algorithm for dimension reduction using Euclidean distance oriented similarity based rough set model. To show the effectiveness of the algorithm, we take Grammatical Facial Expression Dataset from UCI Machine Learning Repository, created by Freitas et al. [4] and applied KNN classifier before and after feature dimension reduction.
Key-Words / Index Term
Rough Set, Feature Dimension Reduction, Similarity Relation, KNN, Grammatical Facial Expression Recognition
References
[1] Z. Pawlak, “Rough Sets. Theoretical Aspects of Reasoning about Data”, Kluwer Academic Publishers, 1991.
[2] R. Jensen, Q. Shen, “Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches”, IEEE Transactions on Knowledge and Data Engineering, Vol.16, Issue.12, pp.1457-1471, 2004.
[3] J. Stepaniuk, “Similarity Based Rough Sets and Learning”, Proceedings of the Fourth International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery, Tokyo, Japan, pp. 18-22, 1996.
[4] F. A. Freitas, S. M.. Peres, C. A. M. Lima, F. V. Barbosa, “Grammatical Facial Expressions Recognition with Machine Learning”, 27th Florida Artificial Intelligence Research Society Conference (FLAIRS), Palo Alto, pp. 180-185, 2014.
[5] K. S. Ray, S. Kolay “Application of Approximate Equality for Reduction of Feature Vector Dimension”, Journal of Pattern Recognition Research, Vol.11, Issue. 1, pp.26-40, 2016
Citation
A.C. Mondal, S. Kolay, "Feature Dimension Reduction Using Euclidean Distance Oriented Similarity Based Rough Set Model", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.33-36, 2019.
Classification of Agricultural Pests Using Statistical and Color Feature Extraction and Support Vector Machine
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.37-41, Jan-2019
Abstract
Beetles and bugs are most common harmful pests that affect plants easily and can damage entire plant. Most of the beetles and bugs bruise the front surface of leaves to lay eggs so and some of the bugs feed on the extract of leaves. So, the leaves get damaged often and it is essential to detect pest affected leaves as early as possible to take further precautions. In this paper, an automated approach based on digital image processing and machine learning is used to classify three vulnerable pests - blue mint beetle, white mealy bug and red lily beetle from affected leaf images. Image pre-processing methods like noise removal and contrast enhancement followed by color space transformation and k-means clustering is used to segment affected parts of leaves, after that both texture and color features are extracted from segmented leaves and based on extracted features support vector machine classification method is used to classify the pests.
Key-Words / Index Term
Bugs, Beetles, Machine learning, k-means clustering, Feature Extraction, Support Vector Machine
References
[1] Faithpraise Fina1, Philip Birch2, Rupert Young3, J. Obu4,Bassey Faithpraise5 and Chris Chat, “Automatic Plant Pest Detection And Recognition Using K-Means Clustering Algorithm And Correspondence Filters”, International Journal of Advanced Biotechnology and Research ISSN 0976-2612, Online ISSN 2278–599X, Vol 4, Issue 2, 2013, pp 189-199
[2] Kanesh Venugoban and Amirthalingam Ramanan,” Image Classification of Paddy Field Insect Pests Using Gradient-Based Features”,in the journal of International Journal of Machine Learning and Computing, Vol. 4, No. 1, February 2014
[3] Rajeswary .B1, Divya .S2,” Identification and Classification of Pests in Greenhouse Using Advanced SVM in Image Processing”, in the journal of International Journal of Scientific Engineering and Research (IJSER).
[4] Prathibha G P, T G Goutham, Tejaswini M V, Rajas P R, Mrs Kamalam Balasubramani ,”Early Pest Detection in Tomato Plantation using Image Processing ”, in the journal of International Journal of Computer Applications (0975 – 8887) Volume 96– No.12, June 2014
[5] Gaurav Kandalkar, A.V.Deorankar , P.N.Chatur,” Classification of Agricultural Pests Using DWTand Back Propagation Neural Networks”,in the journal of (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3) , 2014, 4034-4037.
[6] M.Manoja, 2Mrs.J.Rajalakshmi,” Early Detection of Pests on Leaves Using Support Vector Machine”, in the journal of International Journal of Electrical and Electronics Research ISSN 2348-6988 (online) Vol. 2, Issue 4, pp: (187-194), Month: October - December 2014
[7] Robert M. Haralick , K. Shanmugam and Its’hak Dinstein “Textural Features For Image Classification”; IEEE Transactions on System man and cybernetics Vol. SMC-3 , No-6,November 1973,PP. 610-621 (1973)
[8] Nameirakpam Dhanachandra, Khumanthem Manglem and Yambem Jina Chanu;” Image Segmentation using K-means Clustering Algorithm and Subtractive Clustering Algorithm”; Eleventh International Multi-Conference on Information Processing-2015 (IMCIP-2015), Procedia Computer Science 54, PP.764 – 771 (2015)
[9] Jagadeesh D. Pujari, Rajesh Yakkundimath and Abdulmunaf S. Byadgi,” Classification of Fungal Disease Symptoms affected on Cereals using Color Texture Features “, International Journal of Signal Processing, Image Processing and Pattern Recognition Vol.6, No.6, PP.321-330 ISSN: 2005-4254 (2013)
[10] Vincent Arvis, Christophe Debain, Michel Berducat And Albert Benassi , “ Generalization Of The Cooccurrence Matrix For Colour Images: Application To Colour Texture Classification”; Image Anal Stereol 2004;23:PP. 63-72 (2004)
[11] J. A. Hartigan and M. A. Wong, “A K-Means Clustering Algorithm”, Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1, PP. 100-108 (1979)
[12] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin; “A Practical Guide to Support Vector Classification”, Department of Computer Science National Taiwan University, Taipei 106, Taiwan.
[13] Chih-Wei Hsu and Chih-Jen Lin,” A Comparison of Methods for Multi-class Support Vector Machines”;IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 2 (2002)
Citation
Aparajita Datta, Abhishek Dey, Kashi Nath Dey, "Classification of Agricultural Pests Using Statistical and Color Feature Extraction and Support Vector Machine", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.37-41, 2019.
Degraded Bangla Character Recognition by k- NN Classifier
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.42-47, Jan-2019
Abstract
Digitization of Bangla degraded document by Optical Character Recognition is a research activities now a days. Some historical documents particularly of 60s and 70s are degrading day by day due to lack of preservation. Those need to be retrieved. In this article, we present our recent study on recognition of degraded printed document images of Bangla, the 7th most popular language in the world. In the proposed approach the input will be low quality degraded images and the output is the recognized characters. In the first step some preprocessing are done on the document image to improve the quality of the scanned image. The proposed approach is an analytic approach. The segmentation is carried out line by line, word by word and finally character by character. The database used is the ISIDDI database. The total number of historical pages in TIF and JPG formats are 535, containing different fonts, sizes, formats and most importantly different levels of degradations. After segmentation we have manually identified 320 classes of such segmented symbols and divided the whole character dataset into test set (30%) and training set (70%). From the training set of 320 classes we have computed the Histogram of gradient feature or HOG feature on the samples. By applying the K-means clustering algorithm clusters for 320 classes has been generated and labeled according to the classes. For a character of test set again the HOG is computed and by applying k-nearest neighbour algorithm with the 320 classes the character is assigned to a character class with the minimum distance. The classification accuracy obtained on the test set is encouraging. We have achieved 82. 80% character or symbol level accuracy on 320 classes from the confusion matrix.
Key-Words / Index Term
Degraded document recognition, Bangla document analysis, K-Means, k-nearest neighbour
References
[1] BB Chaudhuri, U Pal and Mandar Mitra, “Automatic recognition of printed Oriya script”, Sadhana, Vol. 27, Pp. 23–34, 2002.
[2] R Seethalakshmi, TR Sreeranjani, T Balachandar, Abnikant Singh, Markandey Singh, Ritwaj Ratan and Sarvesh Kumar, “Optical character recognition for printed Tamil text using Unicode”, Journal of Zhejiang University-SCIENCE A, Vol.6,Pp. 1297–1305, 2005.
[3] BB Chaudhuri and U Pal, “A complete printed Bangla OCR system”, Pattern Recognition, Vol. 31, Pp. 531–549, 1998.
[4] Ujjwal Bhattacharya, Malayappan Shridhar and Swapan K Parui, “On recognition of handwritten Bangla characters”, Computer Vision, Graphics and Image Processing, Springer publisher, Pp. 817–828, 2006.
[5] Apurva A Desai, “Gujarati handwritten numeral optical character reorganization through neural network”, Pattern Recognition, Vol. 43 Pp. 2582–2589, 2010.
[6] Binu P Chacko, VR Vimal Krishnan, G Raju and P Babu Anto, “Handwritten character recognition using wavelet energy and extreme learning machine”, International Journal of Machine Learning and Cybernetics, Vol. 3,Pp. 149–161, 2012.
[7] C Vasantha Lakshmi and C Patvardhan, “An optical character recognition system for printed Telugu text”, Pattern analysis and applications, Vol. 7, Pp. 190–204, 2014.
[8] Kapil Dev Dhingra, Sudip Sanyal, and Pramod Kumar Sharma, “A robust ocr for degraded documents”, In Advances in Communication Systems and Electrical Engineering, Springer publisher, Pp. 497–509 , 2008.
[9] Laurence Likforman-Sulem, Abderrazak Zahour, and Bruno Taconet. “Text line segmentation of historical documents: a survey”, International journal on document analysis and recognition,Vol. 9(2), Pp. 123–138, 2007.
[10] Tapan Kumar Bhowmik, Swapan Kumar Parui, Utpal Roy, and Lambert Schomaker, “Bangla handwritten character segmentation using structural features: A supervised and bootstrapping approach”, ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 15(4), Pages. 29, 2016.
[11] Chandan Biswas, Partha Sarathi Mukherjee, Koyel Ghosh, Ujjwal Bhattacharya, and Swapan K. Parui, “A hybrid deep architecture for robust recognition of text lines of degraded printed documents”, In 24th International Conference on Pattern Recognition, IEEE, 2018.
[12] Jaakko Sauvola and Matti Pietikäinen, “Adaptive document image binarization”, Pattern Pecognition, Vol. 33(2), Pp. 225–236, 2000.
[13] Chandan Singh, Nitin Bhatia, and Amandeep Kaur, “Hough transform based fast skew detection and accurate skew correction methods”, Pattern Recognition, Vol. 41(12), Pp. 3528– 3546, 2008.
[14] Ying Jie Liu and Fu Cheng You, “Application of mathematical morphology on touching or broken characters processing”, In Advanced Materials Research, Vol. 171, Pp. 73–77, 2011.
[15] BB Chaudhuri and U Pal, “A complete printed bangla ocr system”, Pattern Recognition, Vol 31(5), Pp. 531–549, 1998.
[16] Mohamed Becha Kaaniche, Francois Bremond, “Tracking HoG Descriptors for Gesture Recognition”, Advanced Video and Signal Based Surveillance, 2009 AVSS`09, Sixth IEEE International Conference on, Pp. 140–145, 2009, IEEE.
[17] John A Hartigan and Manchek A Wong, “Algorithm as 136: A k-means clustering algorithm”, Journal of the Royal Statistical Society. Series C (Applied Statistics),Vol. 28(1), Pp. 100–108, 1979.
[18] Keinosuke Fukunaga and Patrenahalli M. Narendra, “A branch and bound algorithm for computing k-nearest neighbors”. IEEE transactions on computers, Vol. 100(7), Pp. 750–753, 1975.
Citation
Jayati Mukherjee, S. K. Parui, Utpal Roy, "Degraded Bangla Character Recognition by k- NN Classifier", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.42-47, 2019.
A Comparative Study between Factor Based Sentiment and Overall Sentiment
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.48-52, Jan-2019
Abstract
Sentiment Analysis has pulled in significantly more consideration from analysts in late years. As web based shopping is getting to be typical, more item data and item audits are posted on the Internet. Since clients can`t see and feel the items straightforwardly, item surveys are turning into a basic wellspring of subjective data. Accordingly, the volume of audits is expanding drastically. It is hard to a client to peruse every one of the surveys of related item and contrast and other item in view of audits. Some of the time there is a contrast between in general assessment of the item and supposition about each feature of a similar item. In this paper, we examine 480 smart phone surveys from famous online business site and endeavor to locate a similar contrast. We allot fuzzy score for each sentiment word and figure arithmetic mean of the allocated fuzzy scores. Examination results demonstrate that connection between the general assessment and aftereffect of feature extraction undertaking , and the promising execution of our methodology has likewise been appeared.
Key-Words / Index Term
Feature Based Sentiment Analysis, Fuzzy, Sentiment Phase Detection, Sentiment Dictionary
References
[1] J. Wang (ed), Encyclopedia of Data Warehousing and Mining (Information Science Reference, Hershey, 2008).
[2] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA, August 2004,pp. 168–177.
[3] J. Yang, J. Myung and S. Lee, "A Holistic Approach to Product Review Summarization," 2009 Software Technologies for Future Dependable Distributed Systems, Tokyo, 2009, pp. 150-154.
doi: 10.1109/STFSSD.2009.
[4] D. Garcia and F. Schweitzer, "Emotions in Product Reviews--Empirics and Models," 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, Boston, MA, 2011, pp. 483-488.
doi: 10.1109/PASSAT/SocialCom.2011.
[5] H. Q. Vu, G. Li and G. Beliakov, "A fuzzy decision support method for customer preferences analysis based on Choquet Integral," 2012 IEEE International Conference on Fuzzy Systems, Brisbane, QLD, 2012, pp. 1-8.doi: 10.1109/FUZZ-IEEE.2012.6250776
[6] Dragoni, Mauro, Andrea GB Tettamanzi, and Célia da Costa Pereira. "A fuzzy system for concept-level sentiment analysis." Semantic web evaluation challenge. Springer, Cham, 2014.
[7] Kamal, A.; Abulaish, M., "Statistical Features Identification for Sentiment Analysis Using Machine Learning Techniques," Computational and Business Intelligence (ISCBI), 2013 International Symposium on , vol., no., pp.178,181, 24-26
[8] Popescu, Ana-Maria, and Orena Etzioni. "Extracting product features and opinions from reviews." Natural language processing and text mining. Springer, London, 2007. 9-28.
[9] Jefferson, Chris, Han Liu, and Mihaela Cocea. "Fuzzy approach for sentiment analysis." Fuzzy Systems (FUZZ-IEEE), 2017 IEEE International Conference on. IEEE, 2017.
[10] Lek, Hsiang Hui, and Danny CC Poo. "Aspect-based Twitter sentiment classification." Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on. IEEE, 2013.
[11] Lalithamani, R. A. N., Leela Sravanthi Thati, and Rakesh Adhikesavan. "Sentence level sentiment polarity calculation for customer reviews by considering complex sentential structures." IJRET: International Journal of Research in Engineering and Technology 3 (2014).
[12] Ahlgren, Per, Bo Jarneving, and Ronald Rousseau. "Requirements for a cocitation similarity measure, with special reference to Pearson`s correlation coefficient." Journal of the American Society for Information Science and Technology 54.6 (2003): 550-560.
Citation
Santanu Modak, Abhoy Chand Mondal, "A Comparative Study between Factor Based Sentiment and Overall Sentiment", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.48-52, 2019.
Algorithm for Removal of Semantically Insignificant Content Words
Research Paper | Journal Paper
Vol.07 , Issue.01 , pp.53-56, Jan-2019
Abstract
This paper describes how the context specific semantically insignificant content words are extracted using Inverse Document Frequency (IDF) and Inverse Class Frequency (ICF) measure. We are able to remove around 42% of total corpus volume as irrelevant information which includes textual noise, function words and context specific semantically insignificant content words. We have executed different Machine Learning(ML) algorithms used for text classification on a corpus, before and after the removal of the textual noise. We found that there have been no significant change in accuracy of those ML algorithms before and after removal of the textual noise.
Key-Words / Index Term
Machine Learning(ML), Natural Language Processing(NLP), Information Retrieval (IR), Term Document Matrix, Inverse Document Frequency (IDF) and Inverse Class Frequency (ICF), Stop Words, Content Words
References
[1] Dharmendra Sharma, Suresh Jain, “Evaluation of Stemming and Stop Word Techniques on Text Classification Problem”, International Journal of Scientific Research in Computer Science and Engineering, Vol-3(2), PP (1-4) Apr 2015, E-ISSN: 2320-7639.
[2] Ljiljana Dolamic and Jacques Savoy, “When Stopword Lists Makethe Difference,”, Journal of the American Society for Information Science and Technology no. 1, pp. 200–203, 2009.
[3] M. P. Sinka and D. W. Corne, “Evolving Better Stoplists for Document Clustering and Web Intelligence,” Des. Appl. hybrid Intell. Syst., pp. 1015–1023, 2003.
[4] R. Al-Shalabi, G. Kanaan, J. M. Jaam, A. Hasnah and E. Hilat, "Stop-word removal algorithm for Arabic language," Proceedings. 2004 International Conference on Information and Communication Technologies: From Theory to Applications, 2004., Damascus, Syria, 2004, pp. 545
[5] B. Alhadidi and M. Alwedyan, “Hybrid Stop-Word Removal Technique for Arabic Language.,” Egypt Comput Sci, vol. 30(1), no. 1, pp. 35–38, 2008
[6] R. Puri, R. P. S. Bedi, and V. Goyal, “Automated Stopwords Identification in Punjabi Documents,” An Int. J. Eng. Sci., vol. 8, no. June 2013, pp. 119–125, 2013.
[7] Ashish T, Kothari M and Pinkesh P, “Pre-Processing Phase of Text Summarization Based on Gujarati Language”, International Journal of Innovative Research in Computer Science & Technology (IJIRCST) Vol-2,Iss-4, July 2014
[8] Jaideepsinh K. Raulji, Jatinderkumar R. Saini, “Stop-Word Removal Algorithm and its Implementation for Sanskrit Language”, International Journal of Computer Applications (0975 – 8887), Volume 150 – No.2, September 2016
[9] V. Jha, N. Manjunath, P. D. Shenoy and K. R. Venugopal, "HSRA: Hindi stopword removal algorithm," 2016 International Conference on Microelectronics, Computing and Communications (MicroCom), Durgapur, 2016, pp. 1-5
[10] S. Siddiqi and A. Sharan, “Construction of a generic stopwords list for Hindi language without corpus statistics,” Int. J. Adv. Comput. Res., vol. 8, no. 34, pp. 35–40, 2017.
[11] Rakholia R. M. and Saini J. R., “A Rule-based Approach to Identify Stop Words for Gujarati Language”, accepted for publication in Advances in Intelligent and Soft Computing (AISC) Series, ISSN: 1615-3871, 2194-5357, 1860-0794 by Springer-Verlag, Germany. 2017.
[12] Ankita Dhar, Niladri Sekhar Dash, Kaushik Roy, “Categorization of Bangla Web Text DocumentsBased on TF-IDF-ICF Text Analysis Scheme”, Springer Nature Singapore Pte Ltd. 2018,J. K. Mandal and D. Sinha (Eds.): CSI 2017, CCIS 836, pp. 477–484, 2018.
Citation
Abhijit Barman, Diganta Saha, "Algorithm for Removal of Semantically Insignificant Content Words", International Journal of Computer Sciences and Engineering, Vol.07, Issue.01, pp.53-56, 2019.