Sridhar Sowmiyanarayanan

PG Level Advanced Certification Programme in Computational Data Science
Cohort #4

My Projects

Image captioning using a CNN and a Transformer

Image captioning is the task of automatically generating a textual description of an image. Given an input image, the goal is to generate a descriptive and accurate sentence that captures the salient objects, attributes, and relationships depicted in the image. This requires not only recognizing the objects and their features but also understanding the context and the scene, as a whole. The generated caption should be semantically meaningful, grammatically correct, and stylistically appropriate. Moreover, the system should be able to generalize to new images and handle variations in viewpoint, lighting, and object appearance. Image captioning is a challenging task that requires integrating computer vision and natural language processing techniques and has important applications in fields such as image retrieval, visual question answering, and assistive technology.

Based on an extensive Literature survey and in concurrence to our findings, it was observed that the MHA Transformer performs better than CNN-RNN (LSTM) model. Further, MHA Transformer takes less time to train than CNN-RNN (LSTM) model.

BLEU Score target has been achieved (54.8% against 50% target). This is better than most of the published BLEU score performance to our best knowledge.

"Teaches professionals how to unlock the power of data to solve complex business problems and make data driven decisions. Designed by IISc, #1 ranked University (NIRF) and a premier academic institution for world-class education in science, engineering, and design. Delivered by TalentSprint with its deep understanding of the modern technologies, access to industry experts, and a state of art technology platform. Delivered in an executive-friendly format. Unique 5-step learning process of LIVE online faculty-led interactive sessions, capstone projects, mentorship, hackathons, and presentations to ensure fast-track learning."

Sridhar Sowmiyanarayanan

PG Level Advanced Certification Programme in Computational Data Science Cohort #4

My Projects

Image captioning using a CNN and a Transformer

Based on an extensive Literature survey and in concurrence to our findings, it was observed that the MHA Transformer performs better than CNN-RNN (LSTM) model. Further, MHA Transformer takes less time to train than CNN-RNN (LSTM) model.

BLEU Score target has been achieved (54.8% against 50% target). This is better than most of the published BLEU score performance to our best knowledge.

About Program

PG Level Advanced Certification Programme in Computational Data Science
Cohort #4