"Mirage: The Illusion of Visual Understanding"
https://arxiv.org/abs/2603.21687

New research shows that multimodal LLMs may base their answer/reasoning on a non-existent image. Includes a new eval framework B-Clean for filtering compromised questions in benchmarks.