An Automatic Bangla Image Captioning System

Date
2021-01-25
Editor
Journal Title
Volume
Issue
Journal Title
Journal ISSN
Volume Title
Abstract
Automatic image caption generation aims to produce an accurate description of an image in natural language automatically. It is a computationally challenging computer vision task which requires sufficient comprehension of both syntactic and semantic meaning of an image to generate a meaningful description. In recent years, with the rapid development of artificial intelligence, image caption has gradually attracted the attention of many researchers in the field of artificial intelligence and has become an interesting and arduous task. Image caption, automatically generating natural language descriptions according to the content observed in an image, is an important part of scene understanding, which combines the knowledge of computer vision and natural language processing. We know Bangla is the 5th most widely spoken language in the world. While there are many established data sets exists to image annotation in English, no such resources exist for Bangla yet. For this reason, we are trying to develop an automatic image captioning system in Bangla. A deep neural network-based image captioning model was proposed to generate image description. The model employs Convolutional Neural Network (CNN) to classify the whole dataset, while Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) captured the sequential semantic representation of text-based sentences and generate pertinent description based on the modular complexities of an image. Moreover, to address the data set availability issue, a collection of 9154 Bangladeshi contextual images has been accumulated and manually annotated in Bangla. This data set is then used to train our model which integrates a pre-trained VGG16 image embedding model including LSTM layers. The model is train to predict a caption when we will put an image as an input to our model. The result will show that the model has successfully been able to learn a working language model and to generate captions of images in many cases.
Description
Keywords
TECHNOLOGY::Electrical engineering, electronics and photonics::Electrical engineering
Citation
Department Name
Electrical and Computer Engineering
Publisher
North South University
Printed Thesis
DOI
ISSN
ISBN