Long Papers

  1. [Oral] When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

  2. [Oral] LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

  3. [Oral] Multimodal Situational Safety

  4. [Oral] MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

  5. [Poster] Skipping Computations in Multimodal LLMs

  6. [Poster] Coordinated Robustness Evaluation Framework for Vision Language Models

  7. [Poster] Building and better understanding vision-language models: insights and future directions

  8. [Poster] Incorporating Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models

  9. [Poster] CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models

  10. [Poster] LEMoN: Label Error Detection using Multimodal Neighbors

  11. [Poster] MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

  12. [Poster] Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models

  13. [Poster] Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries

  14. [Poster] Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

  15. [Poster] BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

  16. [Poster] MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs

  17. [Poster] How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model?

  18. [Poster] GUIDE: A Responsible Multimodal Approach for Enhanced Glaucoma Risk Modeling and Patient Trajectory Analysis

Short Papers

  1. [Oral] PopAlign: Population-Level Alignment for Fair Text-to-Image Generation

  2. [Oral] Consistency-diversity-realism Pareto fronts of conditional image generative models

  3. [Paper] Trust but Verify: Reliable VLM evaluation in-the-wild with program synthesis

  4. [Poster] You Never Know: Quantization Induces Inconsistent Biases in Vision-Language Foundation Models

  5. [Poster] Aligning to What? Limits to RLHF Based Alignment

  6. [Poster] Exploring Intrinsic Fairness in Stable Diffusion

  7. [Poster] Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models

  8. [Poster] Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries

  9. [Poster] Position Paper: Protocol Learning, Decentralized Frontier Risk and the No-Off Problem

  10. [Poster] Comparison Visual Instruction Tuning

  11. [Poster] Attention Shift: Steering AI Away from Unsafe Content

  12. [Poster] Towards Secure and Private AI: A Framework for Decentralized Inference

  13. [Poster] WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models

  14. [Poster] The Multi-faceted Monosemanticity in Multimodal Representations

  15. [Poster] Adversarial Robust Deep Reinforcement Learning is Neither Robust Nor Safe

  16. [Poster] Probabilistic Active Few-Shot Learning in Vision-Language Models