Semantic Text-video Retrieval

The text-to-video retrieval task requires to rank all the videos in a database based on how semantically close they are to an input query. To do so, both the visual and the textual contents need to be carefully analyzed and understood, meaning that a wide range of Computer Vision and Natural Language Processing techniques are required. Despite the intrinsic difficulty of such a problem, it is a fundamental one: in fact, nowadays several hundreds of hours of video content are uploaded to the Internet every minute, therefore solutions to this important problem are fundamental to perform searches effectively and retrieve all the videos which the user is looking for. Moreover, considering the need for multi-modal content understanding, advancements in this field may lead to improvements in many other problems, including Captioning and Question Answering.

Related publications:

Research group:

  • Alex Falcon (AILAB-Udine Member)
  • Oswald Lanz (TeV – FBK, AILAB-Udine External Collaborator)
  • Giuseppe Serra (AILAB-Udine member)