Application of ensemble Learning in visual question-answering

Loading...
Thumbnail Image

Date

2023-06-10

Journal Title

Journal ISSN

Volume Title

Publisher

University of M'sila

Abstract

Visual Question Answering (VQA) is a field that combines two different techniques: computer vision and natural language processing. Computer vision is used to process the image or video, and NLP uses for the processing of natural language. VQA is a technology that automatically answers the question based on the context of images or videos. The VQA is one of the Vision-language tasks that require a high level of language and image understanding, making this a difficult and complex problem. In this dissertation, we explore and apply an ensemble of diverse VQA models combined with Weighted Average techniques to increase the accuracy.

Description

Keywords

Deep learning, CNN, LSTM, VQA, Ensemble learning, ResNet ,Computer vision, Natural language processing.

Citation

Collections