Open Access   Article Go Back

Proposal of a Real-time American Sign Language Detector using MediaPipe and Recurrent Neural Network

Souradeep Ghosh1

Section:Research Paper, Product Type: Journal Paper
Volume-9 , Issue-7 , Page no. 46-52, Jul-2021

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v9i7.4652

Online published on Jul 31, 2021

Copyright © Souradeep Ghosh . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Souradeep Ghosh, “Proposal of a Real-time American Sign Language Detector using MediaPipe and Recurrent Neural Network,” International Journal of Computer Sciences and Engineering, Vol.9, Issue.7, pp.46-52, 2021.

MLA Style Citation: Souradeep Ghosh "Proposal of a Real-time American Sign Language Detector using MediaPipe and Recurrent Neural Network." International Journal of Computer Sciences and Engineering 9.7 (2021): 46-52.

APA Style Citation: Souradeep Ghosh, (2021). Proposal of a Real-time American Sign Language Detector using MediaPipe and Recurrent Neural Network. International Journal of Computer Sciences and Engineering, 9(7), 46-52.

BibTex Style Citation:
@article{Ghosh_2021,
author = {Souradeep Ghosh},
title = {Proposal of a Real-time American Sign Language Detector using MediaPipe and Recurrent Neural Network},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {7 2021},
volume = {9},
Issue = {7},
month = {7},
year = {2021},
issn = {2347-2693},
pages = {46-52},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=5363},
doi = {https://doi.org/10.26438/ijcse/v9i7.4652}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v9i7.4652}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=5363
TI - Proposal of a Real-time American Sign Language Detector using MediaPipe and Recurrent Neural Network
T2 - International Journal of Computer Sciences and Engineering
AU - Souradeep Ghosh
PY - 2021
DA - 2021/07/31
PB - IJCSE, Indore, INDIA
SP - 46-52
IS - 7
VL - 9
SN - 2347-2693
ER -

VIEWS PDF XML
710 1411 downloads 184 downloads
  
  
           

Abstract

The predominant vocabulary of the deaf and dumb, Sign Language serves as a natural, visual language which our brain is capable of processing and deciphering linguistic details. For the past two decades, scientists have been researching the automated recognition of sign language using translating gloves and complex systems with several cameras. Most of these systems can provide partial or complete recognition of the vocabulary but aren’t cost-effective for the average and below-average section of the demographic. With the advent of AI, we’re trying to overcome this biasness in technology. Google’s MediaPipe, which is an open-source framework for multimodal (video, audio, time-series) features with applied ML pipelines, came into existence in 2019. Using MediaPipe’s Multi-hand Tracking model pipeline we can get landmarks of our fingers. This paper advocates the use of MediaPipe Hand Tracking to get hand landmarks, training a Keras RNN-LSTM model with that data to detect Sign Language of 5 trained words in real-time.

Key-Words / Index Term

MediaPipe, American Sign Language, OpenCV, RNN, LSTM, Real-time

References

[1] Arpit Mittal, Andrew Zisserman, Philip HS Torr, “Hand detection using multiple proposals”, The British Machine Vision Conference, Vol.40, pp.75.1–75.11, 2011.
[2] Ruchi Manish Gurav, Premanand K. Kadbe, “Real time finger tracking and contour detection for gesture recognition using OpenCV”, In the proceedings of 2015 International Conference on Industrial Instrumentation and Control, ICIC 2015, pp. 974–977, 2015, isbn: 9781479971657, doi: 10.1109/IIC.2015.7150886
[3] Sarfaraz Masood, Adhyan Srivastava, Harish Chandra Thuwal, Musheer Ahmad, “Real-Time Sign Language Gesture (Word) Recognition from Video Sequences Using CNN and RNN”, Intelligent Engineering Informatics, Ed. by Vikrant Bhateja, Carlos A. Coello Coello, Suresh Chandra Satapathy, Prasant Kumar Pattnaik., Springer, Singapore, pp. 623–632, 2018, isbn: 978-981-10-7566-7
[4] Kirsti Grobel and Marcell Assan, "Isolated sign language recognition using hidden Markov models", In the proceedings of 1997 IEEE International Conference on Systems, Man, and Cybernetics, Computational Cybernetics and Simulation, Vol.1, pp. 162-167, 1997, doi: 10.1109/ICSMC.1997.625742
[5] Pradeep Kumar, Himaanshu Gauba, Partha Pratim Roy, Debi Prosad Dogra, “Coupled HMM-based multi-sensor data fusion for sign language recognition”, Pattern Recognition Letters, Vol.86, Pages 1-8, 2017, ISSN 0167-8655, doi : /10.1016/j.patrec.2016.12.004
[6] S. A. Mehdi and Y. N. Khan, "Sign language recognition using sensor gloves," In the proceedings of the 9th International Conference on Neural Information Processing, 2002, ICONIP `02., Vol.5, pp. 2204-2206, 2002, doi: 10.1109/ICONIP.2002.1201884.
[7] J. Ga?ka, M. M?sior, M. Zaborski, K. Barczewska, "Inertial Motion Sensing Glove for Sign Language Gesture Acquisition and Recognition,", IEEE Sensors Journal, Vol.16, pp. 6310-6316, 2016, doi: 10.1109/JSEN.2016.2583542.
[8] Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, Matthias Grundmann, “MediaPipe Hands: On-device Real-time Hand Tracking”, Google AI Blog, 2020.
[9] Ivan Vasilev, Daniel Slater, Gianmario Spacagna, Peter Roelants, Valentino Zocca, “Python Deep Learning: Exploring deep learning techniques and neural network architectures with Pytorch, Keras, and TensorFlow”, Packt Publishing Ltd, pp. 198-212, 2019.
[10] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, Cl. Chang, MG. Yong, J. Lee, WT. Chang, “Mediapipe: A framework for building perception pipelines”, Google AI Blog, 2019, doi:1906.08172.
[11] Hoo-Chang Shin, Holger R. Roth, Mingchen Gao, Le Lu, Ziyue Xu, Isabella Nogues, Jianhua Yao, Daniel Mollura, Ronald M, "Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning", IEEE Transactions on Medical Imaging, Vol.35, Issue.5, pp. 1285-1298, 2016, doi: 10.1109/TMI.2016.2528162.
[12] Saurav Singla, Anjali Patel, "Comparative Study of the Deep Learning Neural Networks on the basis of the Human Activity Recognition", International Journal of Computer Sciences and Engineering, Vol.8, Issue.11, pp.27-32, 2020.
[13] J. Sun, J. Wang, T. C. Yeh, “Video understanding: from video classification to captioning”. In the Proceedings of the Computer Vision and Pattern Recognition, Stanford University, pp.1-9, 2017.
[14] H. Li, J. Li, X. Guan, B. Liang, Y. Lai, X. Luo, "Research on Overfitting of Deep Learning," In the proceedings of 2019 15th International Conference on Computational Intelligence and Security (CIS), pp. 78-81, 2019, doi: 10.1109/CIS.2019.00025.
[15] S. Bock, M. Weiß, "A Proof of Local Convergence for the Adam Optimizer," In the proceedings of 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2019, doi: 10.1109/IJCNN.2019.8852239.
[16] N. D. Marom, L. Rokach, A. Shmilovici, "Using the confusion matrix for improving ensemble classifiers," In the proceedings of 2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel, pp. 000555-000559, 2010, doi: 10.1109/EEEI.2010.5662159.
[17] D. Zhang, J. Wang, X. Zhao, X. Wang, "A Bayesian Hierarchical Model for Comparing Average F1 Scores," In the proceedings of 2015 IEEE International Conference on Data Mining, pp. 589-598, 2015, doi: 10.1109/ICDM.2015.44.
[18] M. Genovese, E. Napoli, N. Petra, "OpenCV compatible real time processor for background foreground identification," In the proceedings of 2010 International Conference on Microelectronics, pp. 467-470, 2010, doi: 10.1109/ICM.2010.5696190.
[19] Fatima Ansari, Anwar Hussain Mistry, Yusuf Mirkar, Alim Merchant, "Real Time ASL (American Sign Language) Recognition", International Journal of Computer Sciences and Engineering, Vol.7, Issue.2, pp.848-851, 2019.
[20] S. Singh et al., "Action Replication in GTA5 using Posenet Architecture with LSTM Cells,"In the proceedings of 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM), pp. 544-549, 2021, doi:10.1109/ICIEM51511.2021.9445358.
[21] S. Sharma, K. Shanmugasundaram and S. K. Ramasamy, "FAREC — CNN based efficient face recognition technique using Dlib," In the proceedings of 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), pp. 192-195, 2016, doi: 10.1109/ICACCCT.2016.7831628.