Accepted Papers

Long Papers

[Oral] When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
[Oral] LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment
[Oral] Multimodal Situational Safety
[Oral] MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
[Poster] Skipping Computations in Multimodal LLMs
[Poster] Coordinated Robustness Evaluation Framework for Vision Language Models
[Poster] Building and better understanding vision-language models: insights and future directions
[Poster] Incorporating Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
[Poster] CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models
[Poster] LEMoN: Label Error Detection using Multimodal Neighbors
[Poster] MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
[Poster] Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models
[Poster] Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries
[Poster] Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
[Poster] BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
[Poster] MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs
[Poster] How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model?
[Poster] GUIDE: A Responsible Multimodal Approach for Enhanced Glaucoma Risk Modeling and Patient Trajectory Analysis

Short Papers

[Oral] PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
[Oral] Consistency-diversity-realism Pareto fronts of conditional image generative models
[Paper] Trust but Verify: Reliable VLM evaluation in-the-wild with program synthesis
[Poster] You Never Know: Quantization Induces Inconsistent Biases in Vision-Language Foundation Models
[Poster] Aligning to What? Limits to RLHF Based Alignment
[Poster] Exploring Intrinsic Fairness in Stable Diffusion
[Poster] Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
[Poster] Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries
[Poster] Position Paper: Protocol Learning, Decentralized Frontier Risk and the No-Off Problem
[Poster] Comparison Visual Instruction Tuning
[Poster] Attention Shift: Steering AI Away from Unsafe Content
[Poster] Towards Secure and Private AI: A Framework for Decentralized Inference
[Poster] WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models
[Poster] The Multi-faceted Monosemanticity in Multimodal Representations
[Poster] Adversarial Robust Deep Reinforcement Learning is Neither Robust Nor Safe
[Poster] Probabilistic Active Few-Shot Learning in Vision-Language Models