Accepted Papers
Long Papers
-
[Oral] When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
-
[Oral] LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment
-
[Oral] Multimodal Situational Safety
-
[Oral] MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
-
[Poster] Skipping Computations in Multimodal LLMs
-
[Poster] Coordinated Robustness Evaluation Framework for Vision Language Models
-
[Poster] Building and better understanding vision-language models: insights and future directions
-
[Poster] Incorporating Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
-
[Poster] CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models
-
[Poster] LEMoN: Label Error Detection using Multimodal Neighbors
-
[Poster] MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
-
[Poster] Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models
-
[Poster] Decompose, Recompose, and Conquer: Multi-modal LLMs are Vulnerable to Compositional Adversarial Attacks in Multi-Image Queries
-
[Poster] Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
-
[Poster] BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
-
[Poster] MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs
-
[Poster] How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model?
-
[Poster] GUIDE: A Responsible Multimodal Approach for Enhanced Glaucoma Risk Modeling and Patient Trajectory Analysis
Short Papers
-
[Oral] PopAlign: Population-Level Alignment for Fair Text-to-Image Generation
-
[Oral] Consistency-diversity-realism Pareto fronts of conditional image generative models
-
[Paper] Trust but Verify: Reliable VLM evaluation in-the-wild with program synthesis
-
[Poster] You Never Know: Quantization Induces Inconsistent Biases in Vision-Language Foundation Models
-
[Poster] Aligning to What? Limits to RLHF Based Alignment
-
[Poster] Exploring Intrinsic Fairness in Stable Diffusion
-
[Poster] Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
-
[Poster] Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries
-
[Poster] Position Paper: Protocol Learning, Decentralized Frontier Risk and the No-Off Problem
-
[Poster] Comparison Visual Instruction Tuning
-
[Poster] Attention Shift: Steering AI Away from Unsafe Content
-
[Poster] Towards Secure and Private AI: A Framework for Decentralized Inference
-
[Poster] WikiDO: A New Benchmark Evaluating Cross-Modal Retrieval for Vision-Language Models
-
[Poster] The Multi-faceted Monosemanticity in Multimodal Representations
-
[Poster] Adversarial Robust Deep Reinforcement Learning is Neither Robust Nor Safe
-
[Poster] Probabilistic Active Few-Shot Learning in Vision-Language Models