2nd Workshop on Responsibly Building the Next Generation of Multimodal Foundational Models


NeurIPS 2025 (Tentative)

San Diego, USA

Prior edition: NeurIPS 2024

Introduction

In 2025, multimodal AI has evolved dramatically, with foundation models now seamlessly integrating text, images, video, and audio to create more human-like understanding and generation capabilities. As these technologies continue to transform industries from healthcare to creative arts, the need for responsible development has never been more critical. Recent advancements in multimodal adaptation and generalization have opened new possibilities while simultaneously presenting complex challenges that require proactive solutions.

The landscape of multimodal AI has fundamentally shifted since our last workshop. Major players like OpenAI, Google, and Meta have heavily invested in multimodal capabilities, with breakthrough models like GPT-4o, SORA, Imagen, and Veo2 generating realistic images and videos from text, Meta's SeamlessM4T translating speech and text in real time, and many more are processing and integrating multiple data types simultaneously. These advancements have accelerated adoption across industries, with Gartner estimating that 40% of generative AI offerings will be multimodal by 2027, up from just 1% in 2023.

Our workshop aims to provide a platform for the community to establish responsible design principles for the next generation of multimodal foundation models. This year's goals include::

  • Explore novel design principles emphasizing responsibility and sustainability in multimodal generative models, aiming to reduce their extensive data and computational demands.
  • Establishing best practices for controllability evaluation that go beyond general-purpose scores to assess fundamental skills across modalities.
  • Exploring multimodal test-time adaptation and domain generalization techniques that enhance model reliability across diverse real-world scenarios.
  • Developing frameworks for responsible agentic multimodal systems that can safely operate with greater autonomy while maintaining human oversight in critical decisions.
  • Enhance the robustness against adversarial and backdoor attacks, thereby securing their integrity in adversarial environments.
  • Identify the sources of reliability concerns, whether they stem from data quality, model architecture, or pre-training strategies.


More details are coming soon! Stay tuned!



Organizers