Open Access   Article Go Back

Conceptual Review of Deep Learning Methods for Automatic Image Caption Generation

S. H. Patel1 , N.M. Patel2 , D.G. Thakore3

Section:Review Paper, Product Type: Journal Paper
Volume-7 , Issue-3 , Page no. 987-991, Mar-2019

CrossRef-DOI:   https://doi.org/10.26438/ijcse/v7i3.987991

Online published on Mar 31, 2019

Copyright © S. H. Patel, N.M. Patel, D.G. Thakore . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

View this paper at   Google Scholar | DPI Digital Library

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: S. H. Patel, N.M. Patel, D.G. Thakore, “Conceptual Review of Deep Learning Methods for Automatic Image Caption Generation,” International Journal of Computer Sciences and Engineering, Vol.7, Issue.3, pp.987-991, 2019.

MLA Style Citation: S. H. Patel, N.M. Patel, D.G. Thakore "Conceptual Review of Deep Learning Methods for Automatic Image Caption Generation." International Journal of Computer Sciences and Engineering 7.3 (2019): 987-991.

APA Style Citation: S. H. Patel, N.M. Patel, D.G. Thakore, (2019). Conceptual Review of Deep Learning Methods for Automatic Image Caption Generation. International Journal of Computer Sciences and Engineering, 7(3), 987-991.

BibTex Style Citation:
@article{Patel_2019,
author = {S. H. Patel, N.M. Patel, D.G. Thakore},
title = {Conceptual Review of Deep Learning Methods for Automatic Image Caption Generation},
journal = {International Journal of Computer Sciences and Engineering},
issue_date = {3 2019},
volume = {7},
Issue = {3},
month = {3},
year = {2019},
issn = {2347-2693},
pages = {987-991},
url = {https://www.ijcseonline.org/full_paper_view.php?paper_id=3952},
doi = {https://doi.org/10.26438/ijcse/v7i3.987991}
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
DO = {https://doi.org/10.26438/ijcse/v7i3.987991}
UR - https://www.ijcseonline.org/full_paper_view.php?paper_id=3952
TI - Conceptual Review of Deep Learning Methods for Automatic Image Caption Generation
T2 - International Journal of Computer Sciences and Engineering
AU - S. H. Patel, N.M. Patel, D.G. Thakore
PY - 2019
DA - 2019/03/31
PB - IJCSE, Indore, INDIA
SP - 987-991
IS - 3
VL - 7
SN - 2347-2693
ER -

VIEWS PDF XML
561 227 downloads 199 downloads
  
  
           

Abstract

Automatic generation of caption for given images is a complex AI task. It is a problem of generating textual description for a given input image. This involves both image understanding and natural language generation. This is a very dynamic field. A lot of work has been done and currently ongoing in this domain. The recent frontiers of the fields are based on deep learning based methods. The purpose of this article is to provide overview of deep learning based image captioning methods to readers. The readers will first get basic concepts which are used to in development of various methods. Then basic information on datasets is given. Then three existing work are discussed followed by very brief discussion on other works. Concisely, this article presents classification of existing approaches, popular datasets and some of existing models followed by brief discussion of other works. Initially, the topic is introduced and then broader classification of deep learning based methods is discussed. At last, brief discussions on some methods are done.

Key-Words / Index Term

Image Caption Generation, Deep Learning, Computer Vision

References

[1] K. Papineni, S. Roukos, T. Ward, and W. jing Zhu, “Bleu: a method for automatic evaluation of machine translation,” in proc. Association for Computational Linguistics, Stroudsburg, PA, USA, 2002, pp. 311–318, 2002.
[2] M. Denkowski and A. Lavie, “Meteor universal: Language specific translation evaluation for any target language,” in proc. EACL 2014 Workshop on Statistical Machine Translation, 2014, Baltimore, USA.
[3] R. Vedantam, C. L. Zitnick, and D. Parikh, “Cider: Consensus-based image description evaluation,”, in proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 4566-4575
[4] P. Shah, V. Bakrola, and S. Pati, “Image captioning using deep neural architectures,” in proc. International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), IEEE, Piscataway, NJ, mar 2017.
[5] M. Z. Hossain, F. Sohel, M. F. Shiratuddin, and H. Laga, “A comprehensive survey of deep learning for image captioning,” CoRR, vol. abs/1810.04020, 2018.
[6] S. Bai and S. An, “A survey on automatic image caption generation,” Neurocomputing, vol. 311, pp. 291–304, 2018.
[7] M. Hodosh, P. Young, and J. Hockenmaier, “Framing image description as a ranking task: Data, models and evaluation metrics,” Journal of Artificial Intelligence Research, vol. 47, pp. 853–899, aug 2013.
[8] B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik, “Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models,” in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, IEEE, dec 2015, pp. 2641-2649.
[9] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Computer Vision – ECCV 2014, pp. 740–755, Springer International Publishing, 2014.
[10] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Boston, MA, USA, jun 2015, pp. 3156-3164.
[11] X. Liu, Q. Xu, and N. Wang, “A survey on deep neural network-based image captioning,” The Visual Computer, jun 2018, pp. 1–26.
[12] K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in proc. 32Nd International Conference on Machine Learning - Volume 37, ICML’15, JMLR.org, 2015, pp. 2048–2057.
[13] J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, “Deep captioning with multimodal recurrent neural networks (m-rnn),” eprint arXiv:1412.6632 [cs.CV], Jun 2015.
[14] D.-J. Kim, D. Yoo, B. Sim, and I. S. Kweon, “Sentence learning on deep convolutional networks for image caption generation,” in proc. 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), IEEE, Xi`an, China, aug 2016.
[15] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang, “Bottom-up and top-down attention for image captioning and VQA,” in proc. IEEE Conference on Computer Vision and Pattern Recognition, (CVPR) 2018, Salt Lake City, UT, USA, Jun, 2018, pp. 6077-6086.
[16] J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” in proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, USA, jul 2017..
[17] M. Pedersoli, T. Lucas, C. Schmid, and J. Verbeek, “Areas of attention for image captioning,” in proc. IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, Italy, oct 2017.
[18] K. Fu, J. Jin, R. Cui, F. Sha, and C. Zhang, “Aligning where to see and what to tell: Image captioning with region-based attention and scene-specific contexts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 2321–2334, dec 2017.
[19] A. Poghosyan and H. Sarukhanyan, “Short-term memory with read-only unit in neural image caption generator,” in proc. Computer Science and Information Technologies (CSIT), IEEE, Yerevan, Armenia, sep 2017.
[20] V. Mullachery and V. Motwani, “Image captioning,” arXiv:1805.09137 [cs.CV], may 2018.