all AI news
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
May 9, 2024, 4:42 a.m. | Prannay Kaul, Zhizhong Li, Hao Yang, Yonatan Dukler, Ashwin Swaminathan, C. J. Taylor, Stefano Soatto
cs.LG updates on arXiv.org arxiv.org
Abstract: Mitigating hallucinations in large vision-language models (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term "Type I hallucinations". Instead, they focus on hallucinations responding to very specific question formats -- typically a multiple-choice response regarding a particular object or attribute -- which we term "Type II hallucinations". Additionally, such benchmarks often require external API calls to models which are subject to change. In practice, we observe …
abstract arxiv benchmark benchmarks cs.ai cs.cv cs.lg focus form free hallucination hallucinations language language models multiple object question responses type vision vision-language vision-language models
More from arxiv.org / cs.LG updates on arXiv.org
Jobs in AI, ML, Big Data
Senior Machine Learning Engineer
@ GPTZero | Toronto, Canada
ML/AI Engineer / NLP Expert - Custom LLM Development (x/f/m)
@ HelloBetter | Remote
Doctoral Researcher (m/f/div) in Automated Processing of Bioimages
@ Leibniz Institute for Natural Product Research and Infection Biology (Leibniz-HKI) | Jena
Seeking Developers and Engineers for AI T-Shirt Generator Project
@ Chevon Hicks | Remote
Principal Autonomy Applications
@ BHP | Chile
Quant Analytics Associate - Data Visualization
@ JPMorgan Chase & Co. | Bengaluru, Karnataka, India