Multimedia question answering that involves visual and text
question-answering uses significantly enriched data sets that contain
cross-modal relationships i.e., images and text. At the same time, the problems
related to understanding the logic of complex questions are making the situation
really hard. While the concepts are not new and can be found in proceedings of
major AI and ML conferences decades ago, VQA and TVQA ideas gain momentum. Unlike
general visual question answering, which only builds connections between
questions and visual contents, T-VQA requires reading and reasoning over both
texts and visual concepts that appear in images
#VQA #TVQA #AI #ML
https://tanmingkui.github.io/files/publications/Cascade.pdf
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.