Abstract
We study the problem of non-factoid QA on instructional videos. Existing work focuses either on visual or textual modality of video content, to find matching answers to the question. However, neither is flexible enough for our problem setting of non-factoid answers with varying lengths. Motivated by this, we propose a two-stage model: (a) multimodal segmentation of video into span candidates and (b) length-adaptive ranking of the candidates to the question. First, for segmentation, we propose Segmenter for generating span candidates of diverse length, considering both textual and visual modality. Second, for ranking, we propose Ranker to score the candidates, dynamically combining the two models with complementary strength for both short and long spans respectively. Experimental result demonstrates that our model achieves state-of-the-art performance.
Original language | English |
---|---|
Title of host publication | AAAI 2020 - 34th AAAI Conference on Artificial Intelligence |
Publisher | AAAI press |
Pages | 8147-8154 |
Number of pages | 8 |
ISBN (Electronic) | 9781577358350 |
Publication status | Published - 2020 |
Event | 34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, United States Duration: 2020 Feb 7 → 2020 Feb 12 |
Publication series
Name | AAAI 2020 - 34th AAAI Conference on Artificial Intelligence |
---|
Conference
Conference | 34th AAAI Conference on Artificial Intelligence, AAAI 2020 |
---|---|
Country/Territory | United States |
City | New York |
Period | 20/2/7 → 20/2/12 |
Bibliographical note
Funding Information:∗This work was partially done during the first author’s internship in MSR Asia and supported by MSR Asia grant †Corresponding Author Copyright ©c 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Publisher Copyright:
Copyright © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
All Science Journal Classification (ASJC) codes
- Artificial Intelligence