AI has undergone a paradigm shift in the past decade – the connection between vision and language (V+L) is now an integral part of AI, with deep impact beyond vision and NLP – robotics, graphics, cybersecurity, and HCI are utilizing V+L tools and there are direct industrial implications for software, arts, and media. The link between vision and language is much more complex than simple image--text alignment – the use of language for reasoning beyond the visible (for example, physical reasoning, spatial reasoning, commonsense reasoning, and embodied reasoning) is being pursued. Open-Domain Reasoning in Multi-Modal Settings (ODRUM 2023) provides a platform for discussions on multimodal (vision+language) topics with special emphasis on reasoning capabilities.
The aim of ODRUM 2023 is to address the emerging topic of visual reasoning using multiple modalities (such as text, images, videos, audio, etc.). The workshop will feature invited talks by experts in the realm of reasoning such as: embodied AI, navigation, learning via interaction and collaboration with humans, building large V+L that can perform multiple tasks, visual grounding, and the use of language to instruct robots. Participants and speakers will converge for a panel discussion to discuss the importance of reasoning (a core AI topic that has a rich and long history since the 1950s) to computer vision, relevance to recent progress in visual reasoning, discuss trends and challenges in open-domain reasoning, from different perspectives of NLP, vision, machine learning, and robotics researchers.
08:30 – 08:45 | Welcome and Introduction | |
08:45 – 09:35 | Karel Lenc | Evaluating and Training Large Language Models with Vision Capabilities |
09:35 – 10:00 | Spotlight Talks | |
10:00 – 10:40 | Poster Session + Coffee Break | |
10:40 – 11:30 | Jiajun Wu | Concept Learning Across Domains and Modalities |
11:30 – 12:20 | Srinath Sridhar | Multi-modality in 3D Scene Understanding |
12:20 – 13:20 | Lunch | |
13:20 – 14:10 | Alane Suhr | Two Approaches to Grounded Language Evaluation |
14:10 – 15:00 | Angel Chang | Reasoning with language in 3D |
15:00 – 16:00 | Poster Session 2 + Coffee Break + Socials | |
16:00 – 17:15 | Panel Discussion + Concluding Remarks |
Poster Gallery
Website maintained by Tejas Gokhale