Hierarchical Reasoning Based on Perception Action Cycle for Visual Question Answering

Abdullahi Moallim Mohamud, Safaa; Jalali, Amin; Lee, Minho

doi:10.2139/ssrn.4247187

Download This Paper

Open PDF in Browser

Add Paper to My Library

Hierarchical Reasoning Based on Perception Action Cycle for Visual Question Answering

37 Pages Posted: 13 Oct 2022

See all articles by Safaa Abdullahi Moallim Mohamud

Safaa Abdullahi Moallim Mohamud

Kyungpook National University

Recent visual question answering (VQA) frameworks employ different combinations of attention techniques to derive a correct answer. Attention techniques in vision-language tasks have mostly achieved success through the improvement of local features for both modalities. Attention as a concept is heavily established by human cognition mechanism. Different combinations of attention techniques are not well proven as a means of human cognition. Neural networks were originally inspired by the structure of the human brain. Many researchers have recently resorted to frameworks that resemble the human brain, and their models have achieved high performance. To this end, we aim to consider a framework that utilizes human biological and psychological concepts to achieve a good understanding of vision and language modalities. In this view, we introduce a hierarchical reasoning based on a perception action cycle (HIPA) framework to tackle VQA tasks. It integrates the reasoning process of multi-modalities with the perception action cycle (PAC), which explains the learning mechanism of humans about the surrounding world. It comprehends the visual modality through three phases of reasoning: object-level attention, organization, and interpretation. It comprehends the language modality through word-level attention, interpretation, and conditioning. Subsequently, vision and language modalities are interpreted dependently in a cyclic and hierarchical way throughout the entire framework. For further assessment of the visual and language features, we argue that image-question pairs of the same answer ought to have similar visual and language features eventually. As a result, we conduct visual and language feature evaluation experiments using metrics such as standard deviation of cosine similarity and Manhattan distance. We show that employing PAC in our framework improves the standard deviation compared with other VQA frameworks. For further assessment, we also test the novel proposed HIPA on the visual relationship detection (VRD) tasks. The proposed method achieves the state-of-the-art results on the TDIUC and VRD datasets and obtains competitive results on the VQA 2.0 dataset.

Keywords: Visual question answering, vision language tasks, multi-modality fusion, Attention, bilinear fusion

Suggested Citation: Suggested Citation

Abdullahi Moallim Mohamud, Safaa and Jalali, Amin and Lee, Minho, Hierarchical Reasoning Based on Perception Action Cycle for Visual Question Answering. Available at SSRN: https://ssrn.com/abstract=4247187 or http://dx.doi.org/10.2139/ssrn.4247187

Safaa Abdullahi Moallim Mohamud

Kyungpook National University ( email )

Korea, Republic of (South Korea)

Amin Jalali

Kyungpook National University ( email )

Korea, Republic of (South Korea)

Minho Lee (Contact Author)

Kyungpook National University ( email )

Download This Paper

Open PDF in Browser

Do you have a job opening that you would like to promote on SSRN?

Place Job Opening

Paper statistics

Downloads

57

Abstract Views

333

Rank

788,423

PlumX Metrics

Feedback

Hierarchical Reasoning Based on Perception Action Cycle for Visual Question Answering

Safaa Abdullahi Moallim Mohamud

Amin Jalali

Minho Lee

Abstract

Safaa Abdullahi Moallim Mohamud

Kyungpook National University ( email )

Amin Jalali

Kyungpook National University ( email )

Minho Lee (Contact Author)

Kyungpook National University ( email )

0 References

0 Citations

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Hierarchical Reasoning Based on Perception Action Cycle for Visual Question Answering

Safaa Abdullahi Moallim Mohamud

Amin Jalali

Minho Lee

Abstract

Safaa Abdullahi Moallim Mohamud

Kyungpook National University ( email )

Amin Jalali

Kyungpook National University ( email )

Minho Lee (Contact Author)

Kyungpook National University ( email )

0 References

0 Citations

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Related eJournals

Web Technology eJournal

Data Science, Data Analytics & Informatics eJournal

Libraries & Information Technology eJournal