Jan 15, 2021

Visual Question Answering

 

Multimedia question answering that involves visual and text question-answering uses significantly enriched data sets that contain cross-modal relationships i.e., images and text. At the same time, the problems related to understanding the logic of complex questions are making the situation really hard. While the concepts are not new and can be found in proceedings of major AI and ML conferences decades ago, VQA and TVQA ideas gain momentum. Unlike general visual question answering, which only builds connections between questions and visual contents, T-VQA requires reading and reasoning over both texts and visual concepts that appear in images

#VQA #TVQA #AI #ML

https://tanmingkui.github.io/files/publications/Cascade.pdf





No comments:

Post a Comment

Note: Only a member of this blog may post a comment.