Finding the Right Words
Investigating Machine-Generated Video Description Quality Using a Corpus-Based Approach
DOI:
https://doi.org/10.47476/jat.v2i2.103Keywords:
audio description, video captioning, automation, audiovisual content, corpus-based approachAbstract
This paper examines first steps in identifying and compiling human-generated corpora for the purpose of determining the quality of computer-generated video descriptions. This is part of a study whose general ambition is to broaden the reach of accessible audiovisual content through semi-automation of its description for the benefit of both end-users (content consumers) and industry professionals (content creators). Working in parallel with machine-derived video and image description datasets created for the purposes of advancing computer vision research, such as Microsoft COCO (Lin et al., 2015) and TGIF (Li et al., 2016), we examine the usefulness of audio descriptive texts as a direct comparator. Cognisant of the limitations of this approach, we also explore alternative human-generated video description datasets including bespoke content description. Our research forms part of the MeMAD (Methods for Managing Audiovisual Data) project, funded by the EU Horizon 2020 programme.